Beyond Task-Agnostic: Task-Aware Grouping for Communication-Efficient Multi-Task MoE Inference 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

Beyond Task-Agnostic: Task-Aware Grouping for Communication-Efficient Multi-Task MoE Inference arXiv:2606.01007v1 Announce Type: cross Abstract: Sparsely activated Mixture-of-Experts (MoE) models scale capacity via conditional computation, but distributed inference suffers from cross-GPU expert communication and routing-induced load imbalance. Existing placement methods reduce this cost by co-locating frequently co-activated experts; however, they derive a single deployment plan from globally a

Beyond Task-Agnostic: Task-Aware Grouping for Communication-Efficient Multi-Task MoE Inference · 相关人物