Spatial Priors via Space Filling Curves for Small and Limited Data Vision Transformers 文章

ArXiv CS.CV2026-06-16NEWSen作者: Leyla Naz Candogan, Arshia Afzal, Pol Puigdemont, Volkan Cevher

详细信息

来源站点: ArXiv CS.CV
作者: Leyla Naz Candogan, Arshia Afzal, Pol Puigdemont, Volkan Cevher
文章类型: NEWS
语言: en
发布日期: 2026-06-16

摘要

arXiv:2606.14757v1 Announce Type: new Abstract: Though Vision Transformers (ViTs) have become the dominant backbone in many computer vision tasks, due to permutation equivariance, their attention mechanism lacks explicit spatial inductive biases. This become particularly important in two settings: when model capacity is small or training data is limited. Inspired by the attention masking strategies in Linear Transformers and the scanning patterns of Vision SSMs, we introduce VIOLIN, a lightweight masked attention mechanism that encodes spatial structure within attention via Space Filling Curves (SFCs) with less than 0.0015% extra parameters and negligible computational overhead. VIOLIN scans the image using multiple SFCs to construct curve-specific decay masks, which are then combined and multiplied with the attention matrix. Across a wide range of evaluations, VIOLIN consistently improves performance.

Spatial Priors via Space Filling Curves for Small and Limited Data Vision Transformers 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (2)

相关技术查看全部 (6)