Beyond Output Matching: Preserving Internal Geometry in NVFP4 LLM Distillatio 文章

ArXiv CS.AI2026-06-06NEWSen作者: Fangbo Tu, Junhua Zhao, Chi Liu, Xin Chen, Haifeng Wu, Jian Wan, Srinivasan Manoharan

摘要

arXiv:2606.05682v1 Announce Type: new Abstract: Demand for low-precision inference, including NVFP4-based approaches, has grown as large language models are increasingly deployed in latency and cost constrained production environments. Quantization-aware distillation (QAD) helps recover accuracy lost under low bit quantization by training a quantized student to match the output distribution of a frozen higher precision teacher via a KL-divergence loss. In this work, we first provide a representation level diagnosis of QAD: output matching alone can mask internal degradation, because many intermediate activation geometries can yield similar teacher-aligned logits. Using CKA, we show that KL-only QAD can reduce layerwise representational similarity relative to the BF16 teacher, with especially severe drift in RL-post-trained models.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据