Less is More: Early Stopping Rollout for On-Policy Distillation 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

Less is More: Early Stopping Rollout for On-Policy Distillation arXiv:2605.27028v1 Announce Type: cross Abstract: On-policy distillation has recently emerged as a promising alternative to standard sequence-level imitation, training a student by scoring its own rollouts with a teacher model. However, we observe ``Off-policy Teacher Decay'' problem in this paradigm: for the later tokens, with student's earlier trajectory as context that is off-policy to the teacher, the teacher's ability to produ

Less is More: Early Stopping Rollout for On-Policy Distillation · 相关报道