How's it going? Reinforcement learning in language models recruits a functional welfare axis 文章

ArXiv CS.CL2026-05-29NEWSen作者: Andy Q Han, David J. Chalmers, Pavel Izmailov

How's it going? Reinforcement learning in language models recruits a functional welfare axis · 相关技术