MacArena: Benchmarking Computer Use Agents on an Online macOS Environment 事件

PRODUCT_LAUNCH2026-06-08影响: MEDIUM

MacArena: Benchmarking Computer Use Agents on an Online macOS Environment arXiv:2606.06560v1 Announce Type: cross Abstract: Computer-use agents (CUAs) operate graphical user interfaces (GUIs) through vision and control primitives, and their capabilities have advanced rapidly, driven in part by standardized online evaluation benchmarks such as OSWorld, which serve both as evaluation tools and as training environments for reinforcement learning. However, macOS remains underserved in this landscap

MacArena: Benchmarking Computer Use Agents on an Online macOS Environment · 相关技术