Scale When Needed: Adaptive Neuron-level Mixed Precision Quantization Aware Training 文章

ArXiv CS.AI2026-05-26NEWSen作者: Ayush K. Varshney, Konstantinos Vandikas, \v{S}ar\=unas Girdzijauskas, Adam Orucu, Aneta Vulgarakis Feljan

查看原文 →

关系图谱

摘要

arXiv:2605.25054v1 Announce Type: cross Abstract: Deploying deep neural networks on resource-constrained 6G edge devices demands aggressive compression with minimal accuracy loss. Quantization-Aware Training (QAT) has emerged as a leading compression approach; however, existing mixed-precision methods typically operate at coarse layer- or channel-level granularity. These methods often rely on heuristic or search-based bit-allocation strategies, which may overlook fine-grained variability at the neuron level. We propose Neuron-Level Mixed-Precision QAT (NMP-QAT), where each neuron independently learns its own discrete precision during training. Starting from low-bit precision, NMP-QAT expands bit-width only when training signals demand it, via differentiable surrogates and straight-through estimators, while preserving a fully discrete inference graph. This adaptability extends to both weights and activations, reducing memory movement.

Scale When Needed: Adaptive Neuron-level Mixed Precision Quantization Aware Training 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (4)