Internalizing Outcome Supervision into Process Supervision: A New Paradigm for Reinforcement Learning for Reasoning 文章

ArXiv CS.CL2026-05-26NEWSen作者: Fei Ding, Yongkang Zhang, Runhao Liu, Yuhao Liao, Zijian Zeng, Sibo wang, Huiming Yang

Internalizing Outcome Supervision into Process Supervision: A New Paradigm for Reinforcement Learning for Reasoning · 相关技术