OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification 事件

Name: OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification
Start: 2026-06-02

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification arXiv:2606.01476v1 Announce Type: cross Abstract: On-Policy Distillation (OPD) trains a student model on its own generative trajectories under dense token-level feedback from a stronger teacher, mitigating both the off-policy distribution shift of Supervised Fine-Tuning (SFT) and the sparse credit assignment of Reinforcement Learning (RL). However, standard OPD faces two coupled limitations. First, it requires direct access

人工智能

关系图谱

OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification 事件

相关公司查看全部 (10)

相关人物查看全部 (1)

相关产品查看全部 (10)

相关技术查看全部 (10)

相关报道查看全部 (1)