NeUQI: Near-Optimal Uniform Quantization Parameter Initialization for Low-Bit LLMs 文章

ArXiv CS.CL2026-06-01NEWSen作者: Li Lin, Xinyu Hu, Xiaojun Wan

摘要

arXiv:2505.17595v4 Announce Type: replace-cross Abstract: Large language models (LLMs) achieve impressive performance across domains but face significant challenges when deployed on consumer-grade GPUs or personal devices such as laptops, due to high memory consumption and inference costs. Post-training quantization (PTQ) of LLMs offers a promising solution that reduces their memory footprint and decoding latency. In practice, PTQ with uniform quantization representation is favored due to its efficiency and ease of deployment, as uniform quantization is widely supported by mainstream hardware and software libraries. Recent studies on low-bit uniform quantization have led to noticeable improvements in post-quantization model performance; however, they mainly focus on quantization methodologies, while the initialization of quantization parameters remains underexplored and still relies on the conventional Min-Max formula.

NeUQI: Near-Optimal Uniform Quantization Parameter Initialization for Low-Bit LLMs 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (3)