Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation 事件
PRODUCT_LAUNCH2026-06-10影响: MEDIUM
Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation arXiv:2606.09864v1 Announce Type: cross Abstract: Key-value (KV) cache quantization is widely used to reduce Large Language Model (LLM) inference memory, yet existing evaluations solely focus on measuring perplexity and accuracy without assessing the safety impact. In this study, we explore alignment preservation under KV cache quantization. Across eleven instruction-tuned models (3.8B-72B) and five benchmarks (1,894 promp