GenSpan: Generation-Calibrated Motion Span Priors for Multi-Verb Video Corpus Moment Retrieval 文章

ArXiv CS.CV2026-06-04NEWSen作者: Yunzhuo Sun, Xinyue Liu, Yanyang Li, Nanding Wu, Linlin Zong, Xianchao Zhang, Wenxin Liang

查看原文 →

关系图谱

摘要

arXiv:2603.22121v2 Announce Type: replace Abstract: Video Corpus Moment Retrieval (VCMR) aims to retrieve both the correct video and its temporal segment corresponding to a natural-language query, a task that is especially challenging for multi-verb queries where temporal action ordering is critical. Existing approaches often rely solely on text or static images and struggle to capture implicit motion dynamics, leading to retrieval errors and temporal misalignment. We propose GenSpan, a generation-calibrated VCMR framework that constructs short auxiliary videos from LLM-selected subtitle cues and decomposed sub-events, using these as temporal priors rather than direct retrieval targets. A token selector filters candidate-video features aligned with generated motion, and a bidirectional state-space model efficiently predicts video-moment tuples.

GenSpan: Generation-Calibrated Motion Span Priors for Multi-Verb Video Corpus Moment Retrieval 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (2)

相关技术查看全部 (4)