Bridging Reasoning Trajectories in On-Policy Distillation via Near-Future Guidance 事件
PRODUCT_LAUNCH2026-06-02影响: MEDIUM
Bridging Reasoning Trajectories in On-Policy Distillation via Near-Future Guidance arXiv:2606.00305v1 Announce Type: new Abstract: On-Policy Distillation (OPD) improves large language model reasoning by training a student model on trajectories sampled from its own policy under teacher supervision. Although OPD operates on trajectories, its learning signal remains token-level: it identifies deviations through high-loss tokens and repairs them through local reverse-KL correction. We show that thi
相关产品查看全部 (10)
相关报道查看全部 (1)
Bridging Reasoning Trajectories in On-Policy Distillation via Near-Future Guidance
ArXiv CS.CL2026-06-02