VTI-CoT: Visual-Textual Interleaved Chain of Thought for Video Reasoning 事件

PRODUCT_LAUNCH2026-06-05影响: MEDIUM

VTI-CoT: Visual-Textual Interleaved Chain of Thought for Video Reasoning arXiv:2606.05736v1 Announce Type: new Abstract: Video reasoning aims to understand complex temporal events and causal relationships within videos. Recently, Chain-of-Thought (CoT) has been introduced to this field to enhance reasoning accuracy. However, existing CoT-based video reasoning methods primarily rely on text-only information for logical deduction, overlooking critical visual information during the inference proce