Rethinking the Role of Temperature in Large Language Model Distillation 事件
PRODUCT_LAUNCH2026-06-02影响: MEDIUM
Rethinking the Role of Temperature in Large Language Model Distillation arXiv:2606.00306v1 Announce Type: cross Abstract: Reverse Kullback-Leibler (RKL) divergence is widely favored over forward KL (FKL) in large language models (LLM) distillation, yet this preference is largely based on comparisons that omit the temperature $\tau$, overlooking its central role in softening teacher distributions and improving knowledge transfer. In this work, we revisit temperature in LLM distillation and show
Rethinking the Role of Temperature in Large Language Model Distillation · 相关人物
暂无数据