Stability Implies Redundancy: Delta Attention Selective Halting for Efficient Long-Context Prefilling 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

Stability Implies Redundancy: Delta Attention Selective Halting for Efficient Long-Context Prefilling arXiv:2604.18103v2 Announce Type: replace Abstract: Prefilling computational costs pose a significant bottleneck for Large Language Models (LLMs) and Large Multimodal Models (LMMs) in long-context settings. While token pruning reduces sequence length, prior methods rely on heuristics that break compatibility with hardware-efficient kernels like FlashAttention. In this work, we observe that toke