Rethinking the Role of Temperature in Large Language Model Distillation 事件
PRODUCT_LAUNCH2026-06-02影响: MEDIUM
Rethinking the Role of Temperature in Large Language Model Distillation arXiv:2606.00306v1 Announce Type: cross Abstract: Reverse Kullback-Leibler (RKL) divergence is widely favored over forward KL (FKL) in large language models (LLM) distillation, yet this preference is largely based on comparisons that omit the temperature $\tau$, overlooking its central role in softening teacher distributions and improving knowledge transfer. In this work, we revisit temperature in LLM distillation and show