LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis 事件
PRODUCT_LAUNCH2026-06-01影响: MEDIUM
LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis arXiv:2605.30434v1 Announce Type: cross Abstract: Real-world data analysis is inherently iterative, yet existing benchmarks mostly evaluate isolated or short interactive tasks, leaving agents' ability to track evolving analytical context over long horizons untested. We introduce LongDS, a benchmark for long-horizon, multi-turn data analysis where agents must maintain, update, restore, and compose evolving analytical states. Long
相关公司查看全部 (10)
相关人物
暂无数据
相关产品查看全部 (10)
相关报道查看全部 (1)
LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis
ArXiv CS.CL2026-06-01