GRKV: Global Regression for Training-Free KV Cache Compression in Long-Context LLMs 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

GRKV: Global Regression for Training-Free KV Cache Compression in Long-Context LLMs arXiv:2605.31105v1 Announce Type: new Abstract: Large language models (LLMs) with extended context lengths rely on the key-value (KV) cache to support attention over prior tokens. However, maintaining the KV cache incurs substantial memory overhead, motivating KV-cache compression methods that enforce a fixed budget through eviction and merging. Modern eviction methods increasingly adopt span-based retention bec