Grammatically-Guided Sparse Attention for Efficient and Interpretable Transformers 文章

ArXiv CS.CL2026-05-26NEWSen作者: Spandan Pratyush

摘要

arXiv:2605.24518v1 Announce Type: new Abstract: The quadratic complexity of self-attention in Transformer models remains a significant bottleneck for processing long sequences and deploying large language models efficiently. For this approach, there has been significant research into Sparse Attention, and Deepseek Sparse Attention has combined various methods of creating segments of tokens to reduce the time complexity. This paper introduces a novel approach, Grammatically-Guided Sparse Attention, which constrains attention computations based on the grammatical roles of tokens. By leveraging Parts-of-Speech (POS) tags, attention masks are dynamically generated that enforce linguistically coherent connections between tokens, reducing the computational graph without sacrificing essential linguistic dependencies.

Grammatically-Guided Sparse Attention for Efficient and Interpretable Transformers 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (6)