ConMoE: Expert-Pool Consolidation via Prototype Reassignment for MoE Compression 文章

ArXiv CS.AI2026-05-29NEWSen作者: Yilun Yao, Jiaming Pan, Elsie Dai, Peizhuang Cong, Yaoming Li, Tong Yang

摘要

arXiv:2605.29350v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) language models reduce per-token computation but still require storing and serving all experts, making deployment memory-intensive. Existing post-training compression methods mainly shrink this cost by pruning experts or merging their weights. We formulate post-training MoE compression as expert-pool consolidation: retaining a smaller set of pretrained experts as reusable prototypes and deterministically remapping each original expert reference to one selected prototype. This view separates the reduced expert pool from the reuse structure that represents the original expert slots, and allows prototype sharing within local layer scopes while preserving the original router interface.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据