Unveiling the Visual Counting Bottleneck in Vision-Language Models 事件

PRODUCT_LAUNCH2026-05-29影响: MEDIUM

Unveiling the Visual Counting Bottleneck in Vision-Language Models arXiv:2605.30170v1 Announce Type: cross Abstract: While Large Vision-Language Models (VLMs) excel at interpolation, they suffer catastrophic failures in systematic generalization, most notably in visual counting. In this work, we investigate this extrapolation bottleneck by deconstructing visual counting into three cognitive stages: visual individuation, magnitude awareness, and symbolic mapping. Using synthetic Go boards and li