FineVerify: Scaling Test-Time Compute with Fine-Grained Self-Verification for Agentic Search 文章

ArXiv CS.CL2026-06-02NEWSen作者: James Xu Zhao, Hui Chen, Bryan Hooi, See-Kiong Ng

摘要

arXiv:2606.00660v1 Announce Type: new Abstract: Agentic search requires language model agents to explore many sources and answer complex information-seeking questions. Scaling test-time compute is a promising way to improve these agents, but current approaches can fail, because correct answers are often sparse and score-based selection depends on model calibration. We propose FineVerify, a fine-grained self-verification framework that decomposes each question into checkable sub-questions, verifies sampled candidates against each sub-question, and selects the candidate with the highest aggregated score. This per-check structure turns selection into simpler local judgments and produces scores under the same explicit criteria. Across four agentic search benchmarks and two models, FineVerify consistently outperforms standard scaling baselines. With only four sampled trajectories, it improves GPT-5-mini by 8.2 accuracy points and Gemini-3-flash by 5.6% on average.

FineVerify: Scaling Test-Time Compute with Fine-Grained Self-Verification for Agentic Search 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (7)

相关技术查看全部 (2)