Pop-Up Distractions Reveal Bag-of-Events Behavior in Video Large Language Models 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

Pop-Up Distractions Reveal Bag-of-Events Behavior in Video Large Language Models arXiv:2605.27101v1 Announce Type: new Abstract: A key capability for video understanding is reliably linking subjects to events across time, yet whether Video Large Language Models (VideoLLMs) actually achieve this remains unclear. In this work, we introduce DistractionBench to evaluate whether VideoLLMs can robustly link subjects and events in the presence of unrelated video segments. Through controlled interventi