StepPO: Step-Aligned Policy Optimization for Agentic Reinforcement Learning 文章

ArXiv CS.CL2026-06-02NEWSen作者: Daoyu Wang, Qingchuan Li, Mingyue Cheng, Jie Ouyang, Shuo Yu, Qi Liu, Enhong Chen

StepPO: Step-Aligned Policy Optimization for Agentic Reinforcement Learning · 相关技术