Hurwitz Quaternion Multiplicative Quantization for KV Cache Compression 事件

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

Hurwitz Quaternion Multiplicative Quantization for KV Cache Compression arXiv:2605.27646v1 Announce Type: cross Abstract: We propose \textbf{Hurwitz Quaternion Multiplicative Quantization (HQMQ)}, a \textbf{calibration-free} method for KV cache compression of large language models. HQMQ treats each 4-element chunk of K or V as a quaternion and quantizes its unit direction to the \emph{product} $q_p \cdot q_s$, where $q_p$ ranges over the 24-element Hurwitz group $2T$ (the 24 vertices of the 24-