ADWIN: Adaptive Windows for Horizon-Aware On-Policy Distillation 事件
PRODUCT_LAUNCH2026-05-28影响: MEDIUM
ADWIN: Adaptive Windows for Horizon-Aware On-Policy Distillation arXiv:2605.28396v1 Announce Type: cross Abstract: On-policy distillation (OPD) transfers reasoning behavior by training a student on teacher feedback along student-generated trajectories, but standard full-rollout training ties every update to a costly completion and can over-allocate supervision to late positions with low marginal value for the current student. We revisit this assumption through the useful supervision horizon: st
相关产品查看全部 (10)
相关报道查看全部 (1)
ADWIN: Adaptive Windows for Horizon-Aware On-Policy Distillation
ArXiv CS.AI2026-05-28