AgentAtlas: Beyond Outcome Leaderboards for LLM Agents 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

AgentAtlas: Beyond Outcome Leaderboards for LLM Agents arXiv:2605.20530v2 Announce Type: replace-cross Abstract: Large language model agents now act on codebases, browsers, operating systems, calendars, files, and tool ecosystems, but their evaluations often collapse behavior into final task success. AgentAtlas reframes agent evaluation as a diagnostic vocabulary and audit protocol for separating outcome success from control-decision quality and trajectory quality. The paper contributes: (i) a

AgentAtlas: Beyond Outcome Leaderboards for LLM Agents · 相关报道