Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs 文章

ArXiv CS.CL2026-05-26NEWSen作者: Bo Li, Tianyu Dong, Shaolin Zhu, Deyi Xiong

摘要

arXiv:2605.24681v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown great promise in multilingual machine translation (MT), even with limited bilingual supervision. However, fine-tuning LLMs with parallel corpora presents major challenges, namely parameter interference. To address these issues, we propose Mix-MoE, a mixed Mixture-of-Experts framework designed to train LLMs for multilingual MT. Our framework operates in two distinct stages: (1) post-pretraining with MoE on monolingual corpora, and (2) post-pretraining with MoE on parallel corpora. Crucially, we divide the MoE layers into two specialized groups: Language Model Experts (LM Experts) and Machine Translation Experts (MT Experts). LM Experts are designed to capture and retain the monolingual knowledge learned by the pre-trained LLM. MT Experts, on the other hand, are specifically trained to acquire and store bilingual translation knowledge.

Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs 文章

摘要

相关事件查看全部 (2)

相关公司查看全部 (2)

相关人物

相关产品查看全部 (8)

相关技术查看全部 (17)