Credit Assignment with Resets in Language Model Reasoning 文章

ArXiv CS.AI2026-05-26NEWSen作者: Ankur Samanta, Akshayaa Magesh, Ayush Jain, Youliang Yu, Daniel Jiang, Kavosh Asadi, Daniel Jiang, Kaveh Hassani, Paul Sajda, Jalaj Bhandari, Yonathan Efroni

摘要

arXiv:2605.25507v1 Announce Type: new Abstract: Contemporary reinforcement learning with verifiable reward methods post-train language models on multi-step reasoning by assigning a single outcome reward uniformly across all tokens in a trajectory. Such uniform assignment ignores which steps contributed to success or failure. Improving credit assignment can address this limitation by enabling targeted refinement of faulty reasoning steps, rather than updating entire trajectories uniformly. Resets are one such simple mechanism, enabling more precise credit assignment by returning to an intermediate state and resampling counterfactual continuations, so that outcome differences can be attributed to decisions made at that point.

相关事件查看全部 (1)

Credit Assignment with Resets in Language Model Reasoning
2026-05-26PRODUCT_LAUNCH影响: MEDIUM

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据

相关技术

暂无数据