Beyond Two-Stage Training: Cooperative SFT and RL for LLM Reasoning 文章

ArXiv CS.CL2026-06-02NEWSen作者: Liang Chen, Xueting Han, Li Shen, Jing Bai, Kam-Fai Wong

Beyond Two-Stage Training: Cooperative SFT and RL for LLM Reasoning · 相关技术