Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies 文章

ArXiv CS.CL2026-06-01NEWSen作者: Yuqiao Tan, Minzheng Wang, Shizhu He, Huanxuan Liao, Chengfeng Zhao, Qiunan Lu, Tian Liang, Jun Zhao, Kang Liu

Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies · 相关技术

相关技术