PolarQuant: Leveraging Polar Transformation for Efficient Key Cache Quantization and Decoding Acceleration 文章

ArXiv CS.CL2026-06-08NEWSen作者: Songhao Wu, Ang Lv, Xiao Feng, Yufei Zhang, Xun Zhang, Guojun Yin, Wei Lin, Rui Yan

摘要

arXiv:2502.00527v2 Announce Type: replace-cross Abstract: The KV cache in large language models is a dominant factor in memory usage, limiting their broader applicability. Quantizing the cache to lower bit widths is an effective way to reduce computational costs; however, previous methods struggle with quantizing key vectors due to outliers, resulting in excessive overhead. We propose a novel quantization approach called PolarQuant, which efficiently addresses the outlier challenge. We observe that outliers typically appear in only one of two dimensions, which are rotated together by a specific angle when rotary position embeddings are applied. When represented as two-dimensional vectors, these dimensions exhibit well-structured patterns, with radii and angles smoothly distributed in polar coordinates. This alleviates the challenge of outliers on per-channel quantization, making them well-suited for quantization.

PolarQuant: Leveraging Polar Transformation for Efficient Key Cache Quantization and Decoding Acceleration 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (3)