SWE-Marathon: Can Agents Autonomously Complete Ultra-Long-Horizon Software Work? 事件

PRODUCT_LAUNCH2026-06-09影响: MEDIUM

SWE-Marathon: Can Agents Autonomously Complete Ultra-Long-Horizon Software Work? arXiv:2606.07682v1 Announce Type: cross Abstract: AI agents are increasingly expected to complete long-horizon workflows that require sustained progress over hours, millions of tokens, and complex environments. Yet current agent benchmarks largely evaluate short-form tasks, such as single pull requests, small tickets, or 5-10 minute exercises, limiting our ability to measure agents' capabilities in planning, long-c