SwInception -- Local Attention Meets Convolutions 文章

ArXiv CS.CV2026-05-29NEWSen作者: David Hagerman, Roman Naeem, Jakob Lindqvist, Carl Lindstr\"om, Fredrik Kahl, Lennart Svensson

查看原文 →

关系图谱

摘要

arXiv:2605.29954v1 Announce Type: new Abstract: Sparse vision transformers have gained popularity as efficient encoders for medical volumetric segmentation, with Swin emerging as a prominent choice. Swin uses local attention to reduce complexity and yields excellent performance for many tasks but still tends to overfit on small datasets. To mitigate this weakness, we propose a novel architecture that further enhances Swin's inductive bias by introducing Inception blocks in the feed-forward layers. The introduction of these multi-branch convolutions enables more direct reasoning over local, multi-scale features within the transformer block. We have also modified the decoder layers in order to capture finer details using fewer parameters. We demonstrate a performance improvement on eleven different medical datasets through extensive experimentation.

SwInception -- Local Attention Meets Convolutions 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (3)