Beyond the Final Answer: Evaluating the Reasoning Trajectories of Tool-Augmented Agents 文章

ArXiv CS.CL2026-05-26NEWSen作者: Wonjoong Kim, Sangwu Park, Yeonjun In, Sein Kim, Dongha Lee, Chanyoung Park

详细信息

来源站点: ArXiv CS.CL
作者: Wonjoong Kim, Sangwu Park, Yeonjun In, Sein Kim, Dongha Lee, Chanyoung Park
文章类型: NEWS
语言: en
发布日期: 2026-05-26

摘要

arXiv:2510.02837v3 Announce Type: replace-cross Abstract: Although recent tool-augmented benchmarks involve complex requests, evaluation remains limited to answer matching, neglecting critical trajectory aspects like efficiency, hallucination, and adaptivity. The most straightforward method for evaluation is to compare an agent's trajectory with the ground-truth, but annotating all valid ground-truth trajectories is prohibitively expensive. In this manner, we introduce TRACE, a reference-free framework for the multi-dimensional evaluation of tool-augmented LLMs. By incorporating an evidence bank which accumulates knowledge from preceding steps, TRACE assesses an agent's reasoning trajectory effectively. To validate our framework, we develop a new meta-evaluation dataset with diverse and flawed trajectories, each labeled with multi-faceted performance scores. Our results confirm that TRACE accurately evaluates complex trajectories even with small open-source LLMs.

Beyond the Final Answer: Evaluating the Reasoning Trajectories of Tool-Augmented Agents 文章

详细信息

摘要

相关事件

相关公司查看全部 (3)

相关人物

相关产品查看全部 (9)

相关技术查看全部 (23)