Trading Complexity for Expressivity Through Structured Generalized Linear Token Mixing 文章

ArXiv CS.CL2026-06-01NEWSen作者: Erwan Fagnou, Paul Caillon, Blaise Delattre, Alexandre Allauzen

摘要

arXiv:2605.31367v1 Announce Type: cross Abstract: Token mixing layers play a key role in how language models can learn and generate long-range dependencies. Their efficiency relies on the necessary trade-off between decoding speed and the memory requirements, along with the cache size. Considering causal generation, this paper explores new trade-offs thanks to a unified framework which separates two crucial features: (i) the direct influence of inputs on outputs in one generation step; (ii) the recurrent propagation of information through past outputs. This framework encompasses major architectures such as attention and state-space models, but also generalizes the recurrence equations by allowing each state to depend on multiple past states rather than only the immediate predecessor.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据