Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection arXiv:2602.03216v3 Announce Type: replace Abstract: The quadratic complexity of attention remains the central bottleneck in long-context inference for large language models. Prior acceleration methods either sparsify the attention map with structured patterns or permanently evict tokens at specific layers, which can retain irrelevant tokens or rely on irreversible early decisions despite the layer-/head-wi

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection · 相关产品