GRPO is Secretly a Process Reward Model 文章

ArXiv CS.AI2026-05-29NEWSen作者: Michael Sullivan, Alexander Koller

GRPO is Secretly a Process Reward Model · 相关技术