Mapping the Schedule x Bit-Width Boundary in Sub-100M Quantisation-Aware Training 事件

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

Mapping the Schedule x Bit-Width Boundary in Sub-100M Quantisation-Aware Training arXiv:2605.25966v1 Announce Type: cross Abstract: We test whether the optimal learning-rate schedule depends on bit-width during from-initialisation quantisation-aware training (QAT) for sub-100M decoder language models. A 720-run factorial grid (Phase 2) over bit-width x warmdown fraction x LR magnitude x model size x seed (FP16/INT8/INT6, 15M-100M, 5 seeds) finds the optimal warmdown is 33% at every (bit-width,