Outcome-Based RL Provably Leads Transformers to Reason, but Only With the Right Data 文章

ArXiv CS.AI2026-06-04NEWSen作者: Yuval Ran-Milo, Yotam Alexander, Shahar Mendel, Nadav Cohen

Outcome-Based RL Provably Leads Transformers to Reason, but Only With the Right Data · 相关技术