Are Full Rollouts Necessary for On-Policy Distillation? 事件
PRODUCT_LAUNCH2026-06-01影响: MEDIUM
Are Full Rollouts Necessary for On-Policy Distillation? arXiv:2605.31490v1 Announce Type: new Abstract: On-policy distillation (OPD) provides dense teacher feedback along rollouts generated by the student and has emerged as a promising post-training paradigm for long-horizon reasoning. However, standard OPD typically generates full rollouts during training, which is computationally expensive and may expose the student to unreliable teacher feedback at late rollout positions, especially during e
相关人物
暂无数据
相关产品查看全部 (10)
相关报道查看全部 (1)
Are Full Rollouts Necessary for On-Policy Distillation?
ArXiv CS.CL2026-06-01