PrecisionCUA: Iterative Visual Refinement for Pixel-Precise Cursor Grounding in Code Editors 文章

ArXiv CS.CV2026-05-28NEWSen作者: Himangi Mittal, Gaurav Mittal, Nelson Daniel Troncoso, Yu Hu

摘要

arXiv:2604.13019v2 Announce Type: replace Abstract: Computer Use Agents (CUAs) fundamentally rely on graphical user interface (GUI) grounding to translate language instructions into executable screen actions, but editing-level grounding in dense coding interfaces (such as VS Code and Cursor), where sub-pixel accuracy is required to interact with dense IDE elements, remains underexplored. Existing approaches typically rely on single-shot coordinate prediction, which lacks a mechanism for error correction and often fails in high-density interfaces. In this technical report, we conduct an empirical study of pixel-precise cursor localization in coding environments. Instead of a single-step execution, our agent engages in an iterative refinement process, utilizing visual feedback from previous attempts to reach the target element. This closed-loop grounding mechanism allows the agent to self-correct displacement errors and adapt to dynamic UI changes.

相关公司

暂无数据

相关人物

暂无数据