VisualNeedle: Benchmarking Active Visual Search in Information-Dense Scenes 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

VisualNeedle: Benchmarking Active Visual Search in Information-Dense Scenes arXiv:2605.26380v1 Announce Type: new Abstract: Frontier multimodal large language models (MLLMs) have been reported to achieve over 90% accuracy on fine-grained perception benchmarks. However, such scores do not necessarily imply faithful use of visual evidence. Prior studies have identified three shortcuts that inflate benchmark performance. First, linguistic priors and lexical cues in questions often enable models to