When Should the Teacher Move? Temporal Coupling and Stability in Self On-Policy Distillation 事件

PRODUCT_LAUNCH2026-06-03影响: MEDIUM

When Should the Teacher Move? Temporal Coupling and Stability in Self On-Policy Distillation arXiv:2606.03532v1 Announce Type: cross Abstract: Self on-policy distillation trains a student policy against a teacher derived from its own parameter history, yet the teacher's update schedule -- which governs the \emph{temporal coupling} between teacher and student -- has not been systematically studied as a stability variable. Through a controlled schedule sweep on Qwen3-8B, we establish that \emph{i