Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding 事件

PRODUCT_LAUNCH2026-06-05影响: MEDIUM

Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding arXiv:2512.05774v2 Announce Type: replace Abstract: Long video understanding (LVU) is challenging because answering real-world queries often depends on sparse, temporally dispersed cues buried in hours of mostly redundant and irrelevant content. While agentic pipelines improve video reasoning capabilities, prevailing frameworks rely on a query-agnostic captioner to perceive video information, which wastes c