Evaluating Deep Research Agents on Expert Consulting Work: A Benchmark with Verifiers, Rubrics, and Cognitive Traps 文章

ArXiv CS.AI2026-06-02NEWSen作者: Tanmay Asthana, Aman Saksena, Divyansh Sahu

Evaluating Deep Research Agents on Expert Consulting Work: A Benchmark with Verifiers, Rubrics, and Cognitive Traps · 相关人物

暂无数据