VidPrism: Heterogeneous Mixture of Experts for Image-to-Video Transfer 文章

ArXiv CS.CV2026-05-28NEWSen作者: Rui Lin, Chuanming Wang, Huadong Ma

摘要

arXiv:2605.28229v1 Announce Type: new Abstract: With the rapid development of pre-training technologies, adapting large-scale Vision-Language Models (VLMs) for video understanding \emph{\ie} image-to-video transfer learning has become a dominant paradigm. To achieve superior performance, it raises as an effective strategy among recent advances to employ Mixture-of-Experts (MoE) to enhance VLMs' temporal modeling capabilities. However, conventional MoE designs suffer from expert homogenization, where all experts act as identical generalists, inefficiently learning spatio-temporal features from undifferentiated video streams. To overcome this problem, we propose VidPrism, a novel heterogeneous temporal Mixture-of-Experts framework. VidPrism pioneers a division of labor by deploying functionally specialized experts, each assuming a role ranging from spatial understanding to temporal modeling.

VidPrism: Heterogeneous Mixture of Experts for Image-to-Video Transfer 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (3)

相关技术查看全部 (2)