STORM: Internalized Modeling for Spatial-Temporal Reasoning in Video-Language Models 事件
PRODUCT_LAUNCH2026-05-26影响: MEDIUM
STORM: Internalized Modeling for Spatial-Temporal Reasoning in Video-Language Models arXiv:2605.26014v1 Announce Type: new Abstract: Many video reasoning tasks require tracking motion, temporal order, and evolving visual states across frames. Existing methods built on large vision-language models (LVLMs) often address this challenge by externalizing reasoning through textual chain-of-thought (CoT), keyframe selection, repeated frame reinsertion, or external tool use. While effective, such pipel
STORM: Internalized Modeling for Spatial-Temporal Reasoning in Video-Language Models · 相关报道
相关报道
STORM: Internalized Modeling for Spatial-Temporal Reasoning in Video-Language Models
ArXiv CS.CV2026-05-26