Getting to the Point: Pointing Improves LVLMs at Counting 事件

BREAKTHROUGH2026-05-29影响: HIGH

Getting to the Point: Pointing Improves LVLMs at Counting arXiv:2603.21746v2 Announce Type: replace Abstract: Pointing-based methods decompose complex tasks as sequential grounding and reasoning steps. Given a query, the model first grounds the relevant objects by generating their coordinates, and then predicts an answer conditioned on these points. While this approach has been shown to increase the performance of Large Vision-Language Models (LVLMs), it remains unclear why and how it improves