IndexMem: Learned KV-Cache Eviction with Latent Memory for Long-Context LLM Inference 事件
PRODUCT_LAUNCH2026-05-26影响: MEDIUM
IndexMem: Learned KV-Cache Eviction with Latent Memory for Long-Context LLM Inference arXiv:2605.25475v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly expected to operate over long contexts, yet standard softmax attention incurs a KV cache that grows linearly with sequence length, quickly becoming the bottleneck for long context inference. A practical remedy is to evict less important KV entries; however, existing eviction policies are largely heuristic and struggle