Scaling, Benchmarking, and Reasoning of Vision-Language Agents for Mobile GUI Navigation 事件
OPEN_SOURCE2026-05-27影响: MEDIUM
Scaling, Benchmarking, and Reasoning of Vision-Language Agents for Mobile GUI Navigation arXiv:2605.27134v1 Announce Type: new Abstract: Vision-Language Models (VLMs) have shown rapid progress in mobile GUI navigation. This paper presents a systematic study of data scaling, benchmarking, and reasoning for VLM-based agents in this domain. To facilitate rigorous evaluation, we introduce HyperTrack, a large-scale dataset with over 16000 real-world tasks across more than 650 Chinese mobile applicat