When Model Merging Breaks Routing: Training-Free Calibration for MoE 文章

ArXiv CS.CL2026-06-03NEWSen作者: Canbin Huang, Tianyuan Shi, Xiaojun Quan, Jingang Wang, Jianfei Zhang, Qifan Wang

摘要

arXiv:2606.03391v1 Announce Type: cross Abstract: Model merging has emerged as a cost-effective approach for consolidating the capabilities of multiple LLMs without retraining. However, existing merging techniques, largely based on linear parameter arithmetic or optimization, struggle when applied to Mixture-of-Experts (MoE) architectures. We identify a critical failure mode in MoE merging, termed routing breakdown, in which the merged router fails to dispatch tokens to suitable experts. Routing breakdown stems from the sensitivity of the non-linear softmax and discrete Top-k routing mechanisms to parameter perturbations from merging, a sensitivity further amplified by load-balancing constraints imposed during MoE pretraining. Because fine-tuned experts exhibit distinct specializations, even modest misrouting can cause severe performance degradation.

When Model Merging Breaks Routing: Training-Free Calibration for MoE 文章

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (7)