OASES: Outcome-Aligned Search-Evaluation Co-Training for Agentic Search 文章

ArXiv CS.CL2026-05-26NEWSen作者: Erhan Zhang, Yiqun Chen, Zechun Niu, Wei Yang, Xiaochi Wei, Yan Gao, Yi Wu, Yao Hu, Jiaxin Mao

查看原文 →

关系图谱

摘要

arXiv:2604.03675v3 Announce Type: replace-cross Abstract: Agentic search enables language models to solve knowledge-intensive tasks by adaptively acquiring external evidence over multiple steps. Reinforcement learning with verifiable rewards (RLVR) has emerged as a widely adopted training paradigm for search agents, yet outcome-only rewards are sparse and provide limited credit assignment for intermediate search actions. Existing process-reward methods therefore seek to densify supervision through proxy signals, external evaluators, or likelihood-based information gain. However, proxy rewards can deviate from the final outcome objective, while fixed evaluators can become stale as the search policy evolves, leading to unreliable process supervision. To address these challenges, we propose OASES, an Outcome-Aligned Search-Evaluation Supervision framework for agentic search.

OASES: Outcome-Aligned Search-Evaluation Co-Training for Agentic Search 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (5)

相关人物

相关产品查看全部 (16)

相关技术查看全部 (23)