RICE-PO: Turning Retrieval Interactions into Credit Signals for Reasoning Agents 文章

ArXiv CS.CL2026-05-27NEWSen作者: Mingchen Li, Hansi Zeng, Zhuo Qian, Jiatan Huang, Hamed Zamani, Hong Yu

摘要

arXiv:2605.26352v1 Announce Type: new Abstract: Retrieval is increasingly moving from one-shot matching toward interactive reasoning, where language agents iteratively inspect evidence, reformulate queries, and search again. Training such agents raises a credit-assignment challenge: executable actions such as queries or summaries can be directly evaluated by the retriever, while latent reasoning steps are not directly observable and only affect future executable actions. This asymmetry makes outcome-level reward assignment unreliable, as the same final reward may credit reasoning steps that did not actually shape retrieval success. We propose RICE-PO, a critic-free policy optimization framework that converts retrieval interactions into localized learning signals.

RICE-PO: Turning Retrieval Interactions into Credit Signals for Reasoning Agents 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (4)

相关人物

相关产品查看全部 (9)

相关技术查看全部 (16)