OPD+: Rethinking the Advantage Design for On-Policy Distillation 文章

ArXiv CS.AI2026-06-02NEWSen作者: Hanyang Zhao, Haoxian Chen, Han Lin, Genta Indra Winata, David Yao, Wenpin Tang

OPD+: Rethinking the Advantage Design for On-Policy Distillation · 相关事件

相关事件