Rethinking the Role of Temperature in Large Language Model Distillation 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

Rethinking the Role of Temperature in Large Language Model Distillation arXiv:2606.00306v1 Announce Type: cross Abstract: Reverse Kullback-Leibler (RKL) divergence is widely favored over forward KL (FKL) in large language models (LLM) distillation, yet this preference is largely based on comparisons that omit the temperature $\tau$, overlooking its central role in softening teacher distributions and improving knowledge transfer. In this work, we revisit temperature in LLM distillation and show

Rethinking the Role of Temperature in Large Language Model Distillation · 相关产品