From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons 文章

ArXiv CS.CL2026-05-29NEWSen作者: Xiangyu Ma, Teng Xiao, Zuchao Li, Lefei Zhang

摘要

arXiv:2605.27387v2 Announce Type: replace Abstract: Diffusion models promise efficient parallel text generation but rely on bidirectional attention, creating a structural mismatch with pre-trained Autoregressive (AR) models. This incompatibility precludes reusing robust AR priors, necessitating prohibitive pre-training from scratch. To bridge this gap, we propose FLUID, a framework that efficiently adapts AR backbones to the diffusion paradigm. By enforcing Strictly Causal Alignment, FLUID enables seamless initialization from standard GPT-style checkpoints, circumventing the need for massive pre-training. Furthermore, we introduce Elastic Horizons, an entropy-driven mechanism that dynamically modulates denoising strides based on local information density rather than fixed schedules.