Why Your Deep Research Agent Fails? On Hallucination Evaluation in Full Research Trajectory 事件

Name: Why Your Deep Research Agent Fails? On Hallucination Evaluation in Full Research Trajectory
Start: 2026-05-26

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

Why Your Deep Research Agent Fails? On Hallucination Evaluation in Full Research Trajectory arXiv:2601.22984v2 Announce Type: replace Abstract: Diagnosing failure patterns in Deep Research Agents (DRAs) remains a critical challenge. Existing benchmarks predominantly rely on end-to-end evaluation, obscuring intermediate hallucinations that accumulate throughout the research trajectory. To bridge this gap, we propose a shift from outcome-based to processaware evaluation by auditing hallucinations

人工智能

关系图谱

Why Your Deep Research Agent Fails? On Hallucination Evaluation in Full Research Trajectory 事件

相关公司查看全部 (10)

相关人物

相关产品查看全部 (10)

相关技术查看全部 (9)

相关报道查看全部 (1)