Quantifying Empirical Compute-Supervision Tradeoffs in RLVR 文章

ArXiv CS.AI2026-05-26NEWSen作者: Ryo Mitsuhashi, Patrick Chen, Isabelle Tseng, Jasin Cekinmez, Addison J. Wu