Learning GUI Grounding with Spatial Reasoning from Visual Feedback 事件
PRODUCT_LAUNCH2026-05-27影响: MEDIUM
Learning GUI Grounding with Spatial Reasoning from Visual Feedback arXiv:2509.21552v2 Announce Type: replace Abstract: Graphical User Interface (GUI) grounding is commonly framed as a coordinate prediction task -- given a natural language instruction, generate on-screen coordinates for actions such as clicks and keystrokes. However, recent Vision Language Models (VLMs) often fail to predict accurate numeric coordinates when processing GUI images with high resolutions and complex layouts. To add
相关产品查看全部 (10)
相关报道查看全部 (1)
Learning GUI Grounding with Spatial Reasoning from Visual Feedback
ArXiv CS.CV2026-05-27