BlockBatch: Multi-Scale Consensus Decoding for Efficient Diffusion Language Model Inference 文章

ArXiv CS.AI2026-05-29NEWSen作者: Xiaoyou Wu (Celine), Cheng-Jhih Shih (Celine), Binfei Ji (Celine), Yong Liu (Celine), Yingyan (Celine), Lin

摘要

arXiv:2605.29233v1 Announce Type: cross Abstract: Diffusion language models (dLLMs) generate text by iteratively denoising multiple token positions in parallel, offering an attractive alternative to strictly autoregressive decoding. In practice, however, block-wise dLLM inference exposes a difficult granularity trade-off: small blocks preserve local conditioning but require many denoising steps, whereas large blocks expose more parallelism but can make premature commitments and accumulate cache error. Existing acceleration methods typically choose a single block size per request, leaving the complementarity among block sizes unused. We show that block size itself is a useful branching dimension. Different block sizes induce related but non-identical KV-cache trajectories: branches often share an initial prefix, bifurcate at semantically decisive positions, and later agree on syntactically lightweight tokens.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据