Latent Reasoning in TRMs is Secretly a Policy Improvement Operator 文章

ArXiv CS.CL2026-06-02NEWSen作者: Arip Asadulaev, Rayan Banerjee, Fakhri Karray, Martin Takac

Latent Reasoning in TRMs is Secretly a Policy Improvement Operator · 相关技术