Moment-Video: Diagnosing Temporal Fidelity of Video MLLMs on Momentary Visual Events 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

Moment-Video: Diagnosing Temporal Fidelity of Video MLLMs on Momentary Visual Events arXiv:2606.02522v1 Announce Type: new Abstract: Video multimodal large language models (MLLMs) have made rapid progress on general and long-form video understanding, yet their ability to preserve brief answer-critical visual evidence remains underexplored. Many practical questions are determined by momentary visual events: localized actions or state transitions that may last only a few frames. Such evidence can