PolarQuant: Leveraging Polar Transformation for Efficient Key Cache Quantization and Decoding Acceleration 事件

PRODUCT_LAUNCH2026-06-08影响: MEDIUM

PolarQuant: Leveraging Polar Transformation for Efficient Key Cache Quantization and Decoding Acceleration arXiv:2502.00527v2 Announce Type: replace-cross Abstract: The KV cache in large language models is a dominant factor in memory usage, limiting their broader applicability. Quantizing the cache to lower bit widths is an effective way to reduce computational costs; however, previous methods struggle with quantizing key vectors due to outliers, resulting in excessive overhead. We propose a no