Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning 事件
PRODUCT_LAUNCH2026-05-28影响: MEDIUM
Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning arXiv:2605.11458v2 Announce Type: replace-cross Abstract: On-policy self-distillation has become a strong recipe for LLM reasoning, where a privileged teacher supervises the student's own rollouts while conditioning on the reference solution. A design choice shared by nearly all such methods, however, has gone unquestioned: the teacher always sees the full reference reasoning. We argue that this default itself is part of the probl
相关人物
暂无数据
相关产品查看全部 (10)
相关报道查看全部 (1)
Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning
ArXiv CS.CL2026-05-28