ProbMoE: Differentiable Probabilistic Routing for Mixture-of-Experts 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

ProbMoE: Differentiable Probabilistic Routing for Mixture-of-Experts arXiv:2606.01509v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) models scale by activating only a small subset of experts per token. However, training such models remains challenging because top-$k$ routing is discrete and non-differentiable, requiring gradient estimators for expert selection whose design remains a central open problem. We introduce ProbMoE, a probabilistic routing framework that models expert selec