dMoE: dLLMs with Learnable Block Experts 文章

ArXiv CS.CL2026-06-01NEWSen作者: Sicheng Feng, Zigeng Chen, Gongfan Fang, Xinyin Ma, Xinchao Wang

摘要

arXiv:2605.30876v1 Announce Type: new Abstract: Diffusion Large Language Models (dLLMs) have recently emerged as a promising alternative to autoregressive models, offering competitive performance while naturally supporting parallel decoding. However, as dLLMs are increasingly integrated with Mixture-of-Experts (MoE) architectures to scale model capacity, a fundamental mismatch arises between block parallel decoding and token-level expert selection. Specifically, each dLLM forward pass processes multiple tokens with bidirectional dependencies, whereas conventional MoE layers route each token independently. This mismatch substantially increases the number of uniquely activated experts, making inference increasingly memory-bound. To address this, we propose dMoE, a simple yet effective block-level MoE framework.

相关事件查看全部 (1)

dMoE: dLLMs with Learnable Block Experts
2026-06-01PRODUCT_LAUNCH影响: MEDIUM

相关公司

暂无数据

相关人物

暂无数据