MobiBench: Multi-Branch, Modular Benchmark for Mobile GUI Agents 文章

ArXiv CS.AI2026-06-02NEWSen作者: Youngmin Im, Byeongung Jo, Jaeyoung Wi, Seungwoo Baek, Tae Hoon Min, Joo Hyung Lee, Sangeun Oh, Insik Shin, Sunjae Lee

查看原文 →

关系图谱

摘要

arXiv:2512.12634v4 Announce Type: replace Abstract: Mobile GUI Agents, AI agents capable of interacting with mobile applications on behalf of users, have the potential to transform human computer interaction. However, current evaluation practices for GUI agents face two fundamental limitations. First, they either rely on single path offline benchmarks or online live benchmarks. Offline benchmarks using static, single path annotated datasets unfairly penalize valid alternative actions, while online benchmarks suffer from poor scalability and reproducibility due to the dynamic and unpredictable nature of live evaluation. Second, existing benchmarks treat agents as monolithic black boxes, overlooking the contributions of individual components, which often leads to unfair comparisons or obscures key performance bottlenecks.

MobiBench: Multi-Branch, Modular Benchmark for Mobile GUI Agents 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (3)