Your Teacher Can't Help You Here: Combating Supervision Fidelity Decay in On-Policy Distillation 事件
PRODUCT_LAUNCH2026-06-01影响: MEDIUM
Your Teacher Can't Help You Here: Combating Supervision Fidelity Decay in On-Policy Distillation arXiv:2605.30833v1 Announce Type: new Abstract: On-policy distillation transfers reasoning capabilities by training a student model on its own generated trajectories using token-level feedback from a teacher. However, we identify a critical bottleneck, \textbf{Supervision Fidelity Decay (SFD)}: as student-generated prefixes lengthen, the teacher's next-token distribution becomes less confident and l