How Auxiliary Reasoning Unleashes GUI Grounding in VLMs 事件

OPEN_SOURCE2026-06-11影响: MEDIUM

How Auxiliary Reasoning Unleashes GUI Grounding in VLMs arXiv:2509.11548v2 Announce Type: replace Abstract: Graphical user interface (GUI) grounding is a fundamental task for building GUI agents. However, general vision-language models (VLMs) struggle with this task due to a lack of specific optimization. We identify a key gap in this paper: while VLMs exhibit significant latent grounding potential, as demonstrated by their performance measured by Pointing Game, they underperform when tasked wi