Multimodal Large Language Model-Enabled Video Translation: A Role-Oriented Survey 文章

ArXiv CS.CV2026-06-02NEWSen作者: Bingzheng Qu, Kehai Chen, Xuefeng Bai, Min Zhang

自然语言处理

查看原文 →

Multimodal Large Language Model-Enabled Video Translation: A Role-Oriented Survey · 相关技术

相关技术

Multimodal Large Language Models (MLLMs)context-aware speech generation multimodal fusion temporal reasoning video understanding multimodal reasoning lip synchronization text-to-speech (TTS)machine translation automatic speech recognition LLM