Unveiling the Visual Counting Bottleneck in Vision-Language Models 事件
PRODUCT_LAUNCH2026-05-29影响: MEDIUM
Unveiling the Visual Counting Bottleneck in Vision-Language Models arXiv:2605.30170v1 Announce Type: cross Abstract: While Large Vision-Language Models (VLMs) excel at interpolation, they suffer catastrophic failures in systematic generalization, most notably in visual counting. In this work, we investigate this extrapolation bottleneck by deconstructing visual counting into three cognitive stages: visual individuation, magnitude awareness, and symbolic mapping. Using synthetic Go boards and li
相关人物
暂无数据
相关产品查看全部 (10)
相关报道查看全部 (1)
Unveiling the Visual Counting Bottleneck in Vision-Language Models
ArXiv CS.CV2026-05-29