On the Geometry of On-Policy Distillation 事件
PRODUCT_LAUNCH2026-06-08影响: MEDIUM
On the Geometry of On-Policy Distillation arXiv:2606.07082v1 Announce Type: cross Abstract: On-policy distillation (OPD) is increasingly used to improve large language model reasoning, but its training dynamics remain poorly understood. We characterize the trajectory of OPD updates in parameter space and compare it with supervised fine-tuning (SFT) and reinforcement learning with verifiable rewards (RLVR). A suite of parameter-space diagnostics consistently places OPD in a relaxed off-principal
相关人物
暂无数据
相关产品查看全部 (10)
相关技术查看全部 (10)
相关报道查看全部 (1)
On the Geometry of On-Policy Distillation
ArXiv CS.AI2026-06-08