TimeSage-MT: A Multi-Turn Benchmark for Evaluating Agentic Time Series Reasoning 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

TimeSage-MT: A Multi-Turn Benchmark for Evaluating Agentic Time Series Reasoning arXiv:2606.01498v1 Announce Type: new Abstract: Time series data inform critical decisions across many real-world domains. While large language model (LLM) agents can analyze data through natural language and tools, it remains unclear whether they can conduct reliable time series analysis across multi-turn conversations. Existing benchmarks focus on single-step tasks such as forecasting and anomaly detection, overl