Which Heads Matter for Reasoning? RL-Guided KV Cache Compression 事件
PRODUCT_LAUNCH2026-05-28影响: MEDIUM
Which Heads Matter for Reasoning? RL-Guided KV Cache Compression arXiv:2510.08525v3 Announce Type: replace Abstract: Reasoning large language models exhibit complex reasoning behaviors via extended chain-of-thought generation that are highly fragile to information loss during decoding, creating critical challenges for KV cache compression. Existing token-dropping methods directly disrupt reasoning chains by removing intermediate steps, while head-reallocation methods, designed for retrieval tas
相关公司查看全部 (10)
相关产品查看全部 (10)
相关报道查看全部 (1)
Which Heads Matter for Reasoning? RL-Guided KV Cache Compression
ArXiv CS.CL2026-05-28