Stochastic Sparse Attention for Memory-Bound Inference 事件

PRODUCT_LAUNCH2026-06-04影响: MEDIUM

Stochastic Sparse Attention for Memory-Bound Inference arXiv:2605.01910v2 Announce Type: replace-cross Abstract: Autoregressive decoding becomes bandwidth-limited at long contexts, as generating each token requires reading all $n_k$ key and value vectors from KV cache. We present Stochastic Additive No-mulT Attention (SANTA), a method that sparsifies value-cache access by sampling $S \ll n_k$ indices from the post-softmax distribution and aggregates only those value rows. This yields an unbiase

Stochastic Sparse Attention for Memory-Bound Inference · 相关人物