Are Full Rollouts Necessary for On-Policy Distillation? 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

Are Full Rollouts Necessary for On-Policy Distillation? arXiv:2605.31490v1 Announce Type: new Abstract: On-policy distillation (OPD) provides dense teacher feedback along rollouts generated by the student and has emerged as a promising post-training paradigm for long-horizon reasoning. However, standard OPD typically generates full rollouts during training, which is computationally expensive and may expose the student to unreliable teacher feedback at late rollout positions, especially during e