CPMobius: Iterative Coach-Player Reasoning for Data-Free Reinforcement Learning 文章

ArXiv CS.CL2026-05-26NEWSen作者: Ran Li, Zeyuan Liu, Yinghao Chen, Bingxiang He, Jiarui Yuan, Zixuan Fu, Weize Chen, Jinyi Hu, Chen Qian, Zhiyuan Liu, Maosong Sun

CPMobius: Iterative Coach-Player Reasoning for Data-Free Reinforcement Learning · 相关技术