ROSD: Reflective On-Policy Self-Distillation for Language Model Reasoning across Domains 文章

ArXiv CS.CL2026-05-28NEWSen作者: Ziqi Zhao, Xinyu Ma, Liu Yang, Yujie Feng, Daiting Shi, Jingzhou He, Xin Xin, Zhaochun Ren, Xiao-Ming Wu

ROSD: Reflective On-Policy Self-Distillation for Language Model Reasoning across Domains · 相关人物

暂无数据