Why Your Deep Research Agent Fails? On Hallucination Evaluation in Full Research Trajectory 事件
PRODUCT_LAUNCH2026-05-26影响: MEDIUM
Why Your Deep Research Agent Fails? On Hallucination Evaluation in Full Research Trajectory arXiv:2601.22984v2 Announce Type: replace Abstract: Diagnosing failure patterns in Deep Research Agents (DRAs) remains a critical challenge. Existing benchmarks predominantly rely on end-to-end evaluation, obscuring intermediate hallucinations that accumulate throughout the research trajectory. To bridge this gap, we propose a shift from outcome-based to processaware evaluation by auditing hallucinations