OPD+: Rethinking the Advantage Design for On-Policy Distillation 文章

ArXiv CS.AI2026-06-02NEWSen作者: Hanyang Zhao, Haoxian Chen, Han Lin, Genta Indra Winata, David Yao, Wenpin Tang

OPD+: Rethinking the Advantage Design for On-Policy Distillation · 相关人物

暂无数据