STORM: Internalized Modeling for Spatial-Temporal Reasoning in Video-Language Models 事件
PRODUCT_LAUNCH2026-05-26影响: MEDIUM
STORM: Internalized Modeling for Spatial-Temporal Reasoning in Video-Language Models arXiv:2605.26014v1 Announce Type: new Abstract: Many video reasoning tasks require tracking motion, temporal order, and evolving visual states across frames. Existing methods built on large vision-language models (LVLMs) often address this challenge by externalizing reasoning through textual chain-of-thought (CoT), keyframe selection, repeated frame reinsertion, or external tool use. While effective, such pipel
相关产品查看全部 (10)
相关报道查看全部 (1)
STORM: Internalized Modeling for Spatial-Temporal Reasoning in Video-Language Models
ArXiv CS.CV2026-05-26