NestedKV: Nested Memory Routing for Long-Context KV Cache Compression 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

NestedKV: Nested Memory Routing for Long-Context KV Cache Compression arXiv:2605.26678v1 Announce Type: new Abstract: Long-context language models are limited by the memory footprint of the key-value (KV) cache. Existing training-free KV compression methods usually rank tokens by one importance signal -- attention, recency, layer-wise allocation, or key distinctiveness -- which becomes brittle when useful context is globally distinctive, locally episodic, or immediately relevant. We introduce N