OpenClawBench: Benchmarking Process-side Anomalies in Real-world Agent Execution Trajectories 事件
PRODUCT_LAUNCH2026-05-29影响: MEDIUM
OpenClawBench: Benchmarking Process-side Anomalies in Real-world Agent Execution Trajectories arXiv:2605.29253v1 Announce Type: new Abstract: Task success can hide process anomalies in real-world agent executions. An agent may pass the final task oracle while still accumulating unresolved ambiguity, unsafe external writes, ignored errors, weakly grounded commitments, or capability-boundary overcommitment. We study this mismatch as the Outcome-Process Gap and introduce OpenClawBench, a large-sca