CAPF: Guiding Search-Agent Rollouts with Credit-Attenuated Privileged Feedback 文章

ArXiv CS.AI2026-06-02NEWSen作者: Bin Chen, Xinye Liao, Yiming Liu, Xin Liao, Chonghan Liu

摘要

arXiv:2606.01830v1 Announce Type: new Abstract: Recent LLM search agents use reinforcement learning with verifiable rewards (RLVR) to learn search-augmented reasoning from outcome rewards. On hard problems, these agents rarely sample end-to-end successful rollouts, leaving outcome-only RLVR with few positive-reward trajectories. We argue that improving learning on such problems requires additional guidance during training, and RLVR already contains verifier-side information that can provide it. This information can identify errors or omissions in the agent's submitted answer and guide revision within the rollout. We propose a training-time mechanism called \textbf{Credit-Attenuated Privileged Feedback} (CAPF), which makes this verifier-side information available through a Privileged Feedback call during training.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据