ARBOR: Online Process Rewards via a Reusable Rubric Buffer for Search Agents 文章

ArXiv CS.CL2026-06-03NEWSen作者: Zheng Liu, Longxiang Zhang, Xintong Wang, Zhiang Xu, Shaoxiong Zhan, Xin Shan, Wen Huang, Tao Dai, Shu-Tao Xia, Chengfu Huo, Liang Ding

ARBOR: Online Process Rewards via a Reusable Rubric Buffer for Search Agents · 相关技术

相关技术