Multimodal Large Language Model-Enabled Video Translation: A Role-Oriented Survey 事件
PRODUCT_LAUNCH2026-06-02影响: MEDIUM
Multimodal Large Language Model-Enabled Video Translation: A Role-Oriented Survey arXiv:2604.11283v2 Announce Type: replace Abstract: Recent progress in multimodal large language models (MLLMs) is reshaping video translation from a cascaded pipeline of automatic speech recognition, machine translation, text-to-speech, and lip synchronization into a unified multimodal reasoning and generation problem. High-quality video translation requires not only semantic fidelity, but also temporal alignment
相关产品查看全部 (10)
相关报道查看全部 (1)
Multimodal Large Language Model-Enabled Video Translation: A Role-Oriented Survey
ArXiv CS.CV2026-06-02