Pruning and Distilling Mixture-of-Experts into Dense Language Models 事件

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

Pruning and Distilling Mixture-of-Experts into Dense Language Models arXiv:2605.28207v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) is now the dominant architecture for frontier language models, yet it requires all expert parameters to be loaded in memory, making it less preferable for memory-constrained deployment. Existing compression methods reduce the number of experts but the output remains an MoE model with the same fundamental limitation. We present the first systematic framewo