SaaS-Bench: Can Computer-Use Agents Leverage Real-World SaaS to Solve Professional Workflows? 事件
PRODUCT_LAUNCH2026-05-26影响: MEDIUM
SaaS-Bench: Can Computer-Use Agents Leverage Real-World SaaS to Solve Professional Workflows? arXiv:2605.15777v2 Announce Type: replace Abstract: Computer-Using Agents (CUAs) are rapidly extending large language models (LLMs) beyond text-based reasoning toward action execution in more complex environments, such as web browsers and graphical user interfaces (GUIs). However, existing web and GUI agent benchmarks often rely on simplified settings, isolated tasks, or short-horizon interactions, mak