摘要
arXiv:2605.02503v3 Announce Type: replace Abstract: Autonomous data analysis agents are increasingly expected to conduct exploratory analysis with limited human guidance about data. However, existing benchmarks typically evaluate such agents in prior-guided settings, providing selected data sources, explicit data schemas, or cleaned data, thereby understating the exploratory burden. To evaluate this realistic exploratory data analysis task, we introduce DataClawBench, a benchmark built from financial think-tank consulting scenarios where agents must independently explore unfamiliar, noisy, cross-domain data and produce verifiable conclusions. DataClawBench provides a unified real-world data environment with approximately 2.06 million records across enterprise, industry, and policy domains, with native data noise preserved.
相关事件查看全部 (1)
相关公司
暂无数据
相关人物
暂无数据
相关技术
暂无数据