Revisiting Observation Reduction for Web Agents: Comprehensive Evaluation with a Lightweight Framework 文章

ArXiv CS.CL2026-05-29NEWSen作者: Masafumi Enomoto, Ryoma Obara, Haochen Zhang, Masafumi Oyamada

摘要

arXiv:2605.29397v1 Announce Type: new Abstract: HTML observations in LLM-based web agents are extremely long, and while many reduction methods have been proposed, it remains unclear which methods reduce overall agent latency while maintaining performance. The main obstacle is the high cost of end-to-end evaluation: in our experiments, evaluating 11 methods across 32 configurations on 33 tasks of WorkArena L1 required 232.4 cumulative hours. To address this, we propose a lightweight evaluation framework based on the Minimal Failure Set (MFS), the minimal set of HTML elements whose removal causes task failure. We define coverage as the fraction of instances in which a reduction method fully retains the MFS, which serves as a proxy metric that requires neither web access nor LLM inference. We validate that coverage strongly correlates with end-to-end success rate, with over 100$\times$ speedup in cumulative evaluation time on both benchmarks.