On the Geometry of On-Policy Distillation 事件

Name: On the Geometry of On-Policy Distillation
Start: 2026-06-08

PRODUCT_LAUNCH2026-06-08影响: MEDIUM

On the Geometry of On-Policy Distillation arXiv:2606.07082v1 Announce Type: cross Abstract: On-policy distillation (OPD) is increasingly used to improve large language model reasoning, but its training dynamics remain poorly understood. We characterize the trajectory of OPD updates in parameter space and compare it with supervised fine-tuning (SFT) and reinforcement learning with verifiable rewards (RLVR). A suite of parameter-space diagnostics consistently places OPD in a relaxed off-principal

人工智能

关系图谱

On the Geometry of On-Policy Distillation 事件

相关公司查看全部 (10)

相关人物

相关产品查看全部 (10)

相关技术查看全部 (10)

相关报道查看全部 (1)