You Only Index Once: Cross-Layer Sparse Attention with Shared Routing 事件
PRODUCT_LAUNCH2026-06-05影响: MEDIUM
You Only Index Once: Cross-Layer Sparse Attention with Shared Routing arXiv:2606.06467v1 Announce Type: new Abstract: Long-context inference in modern LLMs is increasingly constrained by decoding efficiency, especially in reasoning-heavy settings where models generate long intermediate chains of thought. Existing sparse attention methods often face a practical efficiency-quality trade-off. Structured block sparse methods typically provide stronger acceleration but incur noticeable quality loss,