ProbMoE: Differentiable Probabilistic Routing for Mixture-of-Experts 文章

ArXiv CS.AI2026-06-02NEWSen作者: Heng Zhao, Zilei Shao, Guy Van den Broeck, Zhe Zeng

摘要

arXiv:2606.01509v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) models scale by activating only a small subset of experts per token. However, training such models remains challenging because top-$k$ routing is discrete and non-differentiable, requiring gradient estimators for expert selection whose design remains a central open problem. We introduce ProbMoE, a probabilistic routing framework that models expert selection as a distribution over cardinality-constrained expert subsets and formulates routing as probabilistic inference in this discrete subset space. We first propose ProbMoE Exact-$k$ routing, which samples $k$-expert subsets in the forward pass, and the backward pass uses gradients through each expert's exact marginal probability as a tractable surrogate for the true gradient.

ProbMoE: Differentiable Probabilistic Routing for Mixture-of-Experts 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (2)