摘要
arXiv:2605.30753v1 Announce Type: new Abstract: Diffusion-based large language models (dLLMs) support parallel text generation via iterative denoising, yet inference remains latency-heavy because many steps are spent on redundant refinement and repeated remasking of tokens whose final values are already determined. Prior acceleration methods mainly depend on step-local confidence heuristics or fixed schedules, which are sensitive to prompt and task variation and ignore strong positional effects within a sequence. We cast diffusion decoding as a dynamic control problem and show that token-wise denoising trajectories provide the key signal for reliable control. We propose a trace-aware decoding framework with two components.
相关事件查看全部 (1)
Efficient Diffusion LLMs via Temporal-Spatial Parallel Decoding and Confidence Extrapolation
2026-06-01PRODUCT_LAUNCH影响: MEDIUM
相关公司
暂无数据
相关人物
暂无数据
相关产品
暂无数据