OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification 事件
PRODUCT_LAUNCH2026-06-02影响: MEDIUM
OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification arXiv:2606.01476v1 Announce Type: cross Abstract: On-Policy Distillation (OPD) trains a student model on its own generative trajectories under dense token-level feedback from a stronger teacher, mitigating both the off-policy distribution shift of Supervised Fine-Tuning (SFT) and the sparse credit assignment of Reinforcement Learning (RL). However, standard OPD faces two coupled limitations. First, it requires direct access
相关产品查看全部 (10)
相关报道查看全部 (1)
OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification
ArXiv CS.CL2026-06-02