How Auxiliary Reasoning Unleashes GUI Grounding in VLMs 事件
OPEN_SOURCE2026-06-11影响: MEDIUM
How Auxiliary Reasoning Unleashes GUI Grounding in VLMs arXiv:2509.11548v2 Announce Type: replace Abstract: Graphical user interface (GUI) grounding is a fundamental task for building GUI agents. However, general vision-language models (VLMs) struggle with this task due to a lack of specific optimization. We identify a key gap in this paper: while VLMs exhibit significant latent grounding potential, as demonstrated by their performance measured by Pointing Game, they underperform when tasked wi
相关产品查看全部 (10)
相关报道查看全部 (1)
How Auxiliary Reasoning Unleashes GUI Grounding in VLMs
ArXiv CS.CV2026-06-11