VisualOverload: Probing Visual Understanding of VLMs in Really Dense Scenes 事件

BREAKTHROUGH2026-05-26影响: HIGH

VisualOverload: Probing Visual Understanding of VLMs in Really Dense Scenes arXiv:2509.25339v3 Announce Type: replace Abstract: Is basic visual understanding really solved in state-of-the-art VLMs? We present VisualOverload, a slightly different visual question answering (VQA) benchmark comprising 2,720 question-answer pairs, with privately held ground-truth responses. Unlike prior VQA datasets that typically focus on near global image understanding, VisualOverload challenges models to perform