Fast-dLLM++: Fr\'{e}chet Profile Decoding for Faster Diffusion LLM Inference 文章

ArXiv CS.CL2026-06-03NEWSen作者: Siva Rajesh Kasa, Yasong Dai, Sumit Negi, Hongdong Li

摘要

arXiv:2606.02955v1 Announce Type: new Abstract: Diffusion large language models promise parallel token generation, yet inference remains bottlenecked by deciding which masked tokens can be safely committed together. Fast-dLLM addressed this with KV caching and confidence-guided parallel decoding, but its decoding theory uses a homogeneous high-confidence assumption that effectively reduces each candidate set to its weakest selected token. We argue that this leaves speed on the table because real decoding steps exhibit heterogeneous confidence profiles. We propose \textbf{Fast-dLLM++}, a training-free extension that introduces \emph{Fr\'{e}chet profile decoding}: selecting parallel commit sets from the full sorted confidence profile rather than a single worst-case confidence.