On the Limits of Model Merging for Multilinguality in Pre-Training 文章

ArXiv CS.CL2026-05-26NEWSen作者: Seth Aycock, Fedor Vitiugin, Aleksandr Umnov, Christof Monz, Khalil Sima'an

详细信息

来源站点
ArXiv CS.CL
作者
Seth Aycock, Fedor Vitiugin, Aleksandr Umnov, Christof Monz, Khalil Sima'an
文章类型
NEWS
语言
en
发布日期
2026-05-26

摘要

arXiv:2605.25846v1 Announce Type: new Abstract: Endowing models with consistent multilingual performance can be achieved by mixing pre-training data, or post-training approaches such as language-specific model merging. In this work, we test whether merging can be applied to monolingually pre-trained models. We conduct a controlled study on the efficacy of mixed, merged, and monolingual pre-training setups. We find that while monolingual pre-training results in strong in-language performance, merging any combination of monolingual models leads to performance collapse due to interference. Our analysis suggests representational similarity is a prerequisite for model merging. We therefore conclude that the flexibility of merging in fine-tuning does not extend trivially to language-specific pre-training.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据

相关技术

暂无数据