SagaQA: A Multi-hop Reasoning Benchmark for Long-form Narrative Understanding in TV Series 文章

ArXiv CS.CV2026-06-03NEWSen作者: Galann Pennec, Zhengyuan Liu, Nicholas Asher, Philippe Muller, Nancy F. Chen

摘要

arXiv:2606.03301v1 Announce Type: cross Abstract: We introduce SagaQA, a long-form video benchmark for multi-hop reasoning over full-length TV series. Existing video reasoning benchmarks often emphasize local understanding of adjacent frames or clips. SagaQA addresses this gap by requiring high-level comprehension of extended multimodal narratives in entire TV shows. A distinguishing feature of SagaQA is the granularity of its reasoning steps. Our dataset necessitates long-range reasoning hops to connect information across completely different episodes. This requires models to reason over entire events and actions, demanding a deep understanding of the show's narration and progression at a multimodal level. Motivated by recent progress in agentic methods, we further study how different planning strategies handle such complex reasoning.

相关公司

暂无数据

相关人物

暂无数据

相关技术

暂无数据