Moment-KV: Momentum-Based Decode-Time KV Cache Compression for Long Generation 事件

Name: Moment-KV: Momentum-Based Decode-Time KV Cache Compression for Long Generation
Start: 2026-05-29

PRODUCT_LAUNCH2026-05-29影响: MEDIUM

Moment-KV: Momentum-Based Decode-Time KV Cache Compression for Long Generation arXiv:2605.29873v1 Announce Type: new Abstract: Key-Value (KV) cache remains a major bottleneck for deploying Large Language Models (LLMs) in long-generation tasks. Prior work often applies uniform compression across both prefill and decoding caches, but compressing the prefill cache degrades performance by corrupting critical context. While preserving the prefill cache is essential, decoding-phase compression remain

人工智能

关系图谱

Moment-KV: Momentum-Based Decode-Time KV Cache Compression for Long Generation · 相关公司

IFO

RonCOMPANY

Abstract

INVOLVNONPROFIT

arXivNONPROFIT

TemporaRESEARCH_INSTITUTE

ACTNONPROFIT

UniforNONPROFIT

RatioRESEARCH_INSTITUTE

TIME