Q-GeoMem: Question-Guided Geometric Memory for Video Spatial Reasoning 文章

ArXiv CS.CV2026-05-27NEWSen作者: Xianqiang Gao, Qizhi Chen, Delin Qu, Haoming Song, Zhigang Wang, Bin Zhao, Dong Wang, Xuelong Li

查看原文 →

关系图谱

摘要

arXiv:2605.27318v1 Announce Type: new Abstract: Video spatial reasoning requires accumulating viewpoint-dependent evidence over time while retaining information useful to the question being asked. Existing spatial video-language models improve geometric perception and long-range context modeling, but often treat memory as a generic temporal cache, which can introduce redundant or irrelevant geometry and weaken long-horizon reasoning. We propose \textbf{\ours}, a question-guided geometric memory framework for video spatial reasoning. \ours injects camera-conditioned geometry into visual tokens and maintains two complementary memories: a Fine-Grained Context Bank for recent dense features and camera states, and a Semantic-Geometric Evidence Bank for compact long-range evidence. Each candidate frame is scored by the product of Q-Former-based question relevance and novelty with respect to the retained bank;

Q-GeoMem: Question-Guided Geometric Memory for Video Spatial Reasoning 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (3)

相关人物

相关产品查看全部 (5)

相关技术查看全部 (19)