Forget Attention: Importance-Aware Attention Is All You Need 文章

ArXiv CS.CL2026-06-03NEWSen作者: Suhyeong Shin, Yeongwook Yang

摘要

arXiv:2606.02332v2 Announce Type: replace-cross Abstract: Combining attention's global retrieval with the sequential importance signal of state space models (SSMs) is the open challenge of hybrid language modeling. Transformers see everywhere but cannot prioritize; SSMs know what matters but cannot revisit. Existing hybrids -- Jamba (block level) and Hymba (head level) -- place the two in separate compartments, so neither informs the other during the attention computation itself. We propose SISA (SSM-Informed Softmax Attention), which adds an SSM-derived importance term directly inside the attention score and realizes the full operation as a single SDPA call on augmented query/key vectors -- no recurrent state, no custom kernel. At 152M / 5B tokens, SISA reaches LAMBADA-greedy 17.3% (vs. Transformer 13.9 and Mamba-3 15.5) and attains NIAH 100% from step 1K, 7x faster than Transformer's retrieval convergence;

Forget Attention: Importance-Aware Attention Is All You Need 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (11)

相关技术查看全部 (5)