DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention 事件

Name: DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention
Start: 2026-06-02

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention arXiv:2603.08026v2 Announce Type: replace Abstract: Masked diffusion language models enable parallel token decoding, providing a promising alternative to the sequential nature of autoregressive generation. However, their iterative denoising process remains computationally expensive because it repeatedly processes the entire sequence at every step. We observe that across these diffusion steps, most

人工智能

关系图谱

DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention 事件

DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention · 相关技术

相关技术