Learning What to Learn: Stage-Specific Data Sets for SFT-then-RL in Small Language Model Reasoning 文章

ArXiv CS.CL2026-06-04NEWSen作者: Chongyang He, Rui Zhang, Zixuan Wang, Xin Li

Learning What to Learn: Stage-Specific Data Sets for SFT-then-RL in Small Language Model Reasoning · 相关技术