VidPrism: Heterogeneous Mixture of Experts for Image-to-Video Transfer 文章

ArXiv CS.CV2026-05-28NEWSen作者: Rui Lin, Chuanming Wang, Huadong Ma

摘要

arXiv:2605.28229v1 Announce Type: new Abstract: With the rapid development of pre-training technologies, adapting large-scale Vision-Language Models (VLMs) for video understanding \emph{\ie} image-to-video transfer learning has become a dominant paradigm. To achieve superior performance, it raises as an effective strategy among recent advances to employ Mixture-of-Experts (MoE) to enhance VLMs' temporal modeling capabilities. However, conventional MoE designs suffer from expert homogenization, where all experts act as identical generalists, inefficiently learning spatio-temporal features from undifferentiated video streams. To overcome this problem, we propose VidPrism, a novel heterogeneous temporal Mixture-of-Experts framework. VidPrism pioneers a division of labor by deploying functionally specialized experts, each assuming a role ranging from spatial understanding to temporal modeling.

相关公司

暂无数据

相关人物

暂无数据