Credit Assignment with Resets in Language Model Reasoning 文章

ArXiv CS.AI2026-05-26NEWSen作者: Ankur Samanta, Akshayaa Magesh, Ayush Jain, Youliang Yu, Daniel Jiang, Kavosh Asadi, Daniel Jiang, Kaveh Hassani, Paul Sajda, Jalaj Bhandari, Yonathan Efroni

查看原文 →

关系图谱

摘要

arXiv:2605.25507v1 Announce Type: new Abstract: Contemporary reinforcement learning with verifiable reward methods post-train language models on multi-step reasoning by assigning a single outcome reward uniformly across all tokens in a trajectory. Such uniform assignment ignores which steps contributed to success or failure. Improving credit assignment can address this limitation by enabling targeted refinement of faulty reasoning steps, rather than updating entire trajectories uniformly. Resets are one such simple mechanism, enabling more precise credit assignment by returning to an intermediate state and resampling counterfactual continuations, so that outcome differences can be attributed to decisions made at that point.

Credit Assignment with Resets in Language Model Reasoning 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术