Less is More: Early Stopping Rollout for On-Policy Distillation 事件
PRODUCT_LAUNCH2026-05-27影响: MEDIUM
Less is More: Early Stopping Rollout for On-Policy Distillation arXiv:2605.27028v1 Announce Type: cross Abstract: On-policy distillation has recently emerged as a promising alternative to standard sequence-level imitation, training a student by scoring its own rollouts with a teacher model. However, we observe ``Off-policy Teacher Decay'' problem in this paradigm: for the later tokens, with student's earlier trajectory as context that is off-policy to the teacher, the teacher's ability to produ
相关产品查看全部 (10)
相关报道查看全部 (1)
Less is More: Early Stopping Rollout for On-Policy Distillation
ArXiv CS.AI2026-05-27