Rethinking the Role of Temperature in Large Language Model Distillation 事件
PRODUCT_LAUNCH2026-06-02影响: MEDIUM
Rethinking the Role of Temperature in Large Language Model Distillation arXiv:2606.00306v1 Announce Type: cross Abstract: Reverse Kullback-Leibler (RKL) divergence is widely favored over forward KL (FKL) in large language models (LLM) distillation, yet this preference is largely based on comparisons that omit the temperature $\tau$, overlooking its central role in softening teacher distributions and improving knowledge transfer. In this work, we revisit temperature in LLM distillation and show
相关人物
暂无数据
相关产品查看全部 (10)
相关报道查看全部 (1)
Rethinking the Role of Temperature in Large Language Model Distillation
ArXiv CS.AI2026-06-02