Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection 事件

Name: Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection
Start: 2026-06-01

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection arXiv:2602.03216v3 Announce Type: replace Abstract: The quadratic complexity of attention remains the central bottleneck in long-context inference for large language models. Prior acceleration methods either sparsify the attention map with structured patterns or permanently evict tokens at specific layers, which can retain irrelevant tokens or rely on irreversible early decisions despite the layer-/head-wi

人工智能

关系图谱