VisualNeedle: Benchmarking Active Visual Search in Information-Dense Scenes 事件
PRODUCT_LAUNCH2026-05-27影响: MEDIUM
VisualNeedle: Benchmarking Active Visual Search in Information-Dense Scenes arXiv:2605.26380v1 Announce Type: new Abstract: Frontier multimodal large language models (MLLMs) have been reported to achieve over 90% accuracy on fine-grained perception benchmarks. However, such scores do not necessarily imply faithful use of visual evidence. Prior studies have identified three shortcuts that inflate benchmark performance. First, linguistic priors and lexical cues in questions often enable models to
相关公司查看全部 (10)
相关产品查看全部 (10)
相关报道查看全部 (1)
VisualNeedle: Benchmarking Active Visual Search in Information-Dense Scenes
ArXiv CS.CV2026-05-27