ProbMoE: Differentiable Probabilistic Routing for Mixture-of-Experts 文章

ArXiv CS.AI2026-06-02NEWSen作者: Heng Zhao, Zilei Shao, Guy Van den Broeck, Zhe Zeng

摘要

arXiv:2606.01509v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) models scale by activating only a small subset of experts per token. However, training such models remains challenging because top-$k$ routing is discrete and non-differentiable, requiring gradient estimators for expert selection whose design remains a central open problem. We introduce ProbMoE, a probabilistic routing framework that models expert selection as a distribution over cardinality-constrained expert subsets and formulates routing as probabilistic inference in this discrete subset space. We first propose ProbMoE Exact-$k$ routing, which samples $k$-expert subsets in the forward pass, and the backward pass uses gradients through each expert's exact marginal probability as a tractable surrogate for the true gradient.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据