Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories 事件

Name: Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories
Start: 2026-06-02

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories arXiv:2606.02060v1 Announce Type: new Abstract: Deep-research agents solve tasks through long trajectories of search, tool use, evidence inspection, and answer synthesis. Evaluation based on final answers shows whether an agent succeeds, but not which parts of the trajectory make the answer unreliable. We study span-level error localization for deep-research agents. We collect 2,790 real trajectories fro

人工智能

关系图谱