Which Heads Matter for Reasoning? RL-Guided KV Cache Compression 事件

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

Which Heads Matter for Reasoning? RL-Guided KV Cache Compression arXiv:2510.08525v3 Announce Type: replace Abstract: Reasoning large language models exhibit complex reasoning behaviors via extended chain-of-thought generation that are highly fragile to information loss during decoding, creating critical challenges for KV cache compression. Existing token-dropping methods directly disrupt reasoning chains by removing intermediate steps, while head-reallocation methods, designed for retrieval tas

Which Heads Matter for Reasoning? RL-Guided KV Cache Compression · 相关产品