LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis arXiv:2605.30434v1 Announce Type: cross Abstract: Real-world data analysis is inherently iterative, yet existing benchmarks mostly evaluate isolated or short interactive tasks, leaving agents' ability to track evolving analytical context over long horizons untested. We introduce LongDS, a benchmark for long-horizon, multi-turn data analysis where agents must maintain, update, restore, and compose evolving analytical states. Long