Leveraging Routing Dynamics in Mixture-of-Experts Models for Efficient Language Adaptation 文章

ArXiv CS.CL2026-05-29NEWSen作者: Aditi Khandelwal, Marius Mosbach, Verna Dankers, Siva Reddy, Golnoosh Farnadi

摘要

arXiv:2605.29714v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) models are widely used to scale language models, yet their expert routing behavior and adaptation in a multilingual setting remain underexplored. In this work, we study multilingual routing dynamics during continual pre-training of an English-centric MoE model on a multilingual corpus, analyzing how expert usage varies across languages. We find that continual multilingual pre-training leads to diffused, language-agnostic routing in early and middle layers, with language specialization primarily emerging in the final layers. We also show that token-level vocabulary overlap between languages plays an important role in how languages are routed. Motivated by these findings, we propose a parameter-efficient adaptation strategy that updates language-specific and shared experts in the final MoE layers.

Leveraging Routing Dynamics in Mixture-of-Experts Models for Efficient Language Adaptation 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (2)