NeUQI: Near-Optimal Uniform Quantization Parameter Initialization for Low-Bit LLMs 文章

ArXiv CS.CL2026-06-01NEWSen作者: Li Lin, Xinyu Hu, Xiaojun Wan

摘要

arXiv:2505.17595v4 Announce Type: replace-cross Abstract: Large language models (LLMs) achieve impressive performance across domains but face significant challenges when deployed on consumer-grade GPUs or personal devices such as laptops, due to high memory consumption and inference costs. Post-training quantization (PTQ) of LLMs offers a promising solution that reduces their memory footprint and decoding latency. In practice, PTQ with uniform quantization representation is favored due to its efficiency and ease of deployment, as uniform quantization is widely supported by mainstream hardware and software libraries. Recent studies on low-bit uniform quantization have led to noticeable improvements in post-quantization model performance; however, they mainly focus on quantization methodologies, while the initialization of quantization parameters remains underexplored and still relies on the conventional Min-Max formula.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据