STORM: Internalized Modeling for Spatial-Temporal Reasoning in Video-Language Models 文章
ArXiv CS.CV2026-05-26NEWSen作者: Yiming Liang, Yixiao Chen, Yiyang Zhou, Yixuan Wang, Shoubin Yu, Andong Deng, Fuxiao Liu, Qin Zhang, Chen Chen, Mohit Bansal, Huaxiu Yao