Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation 事件

ACQUISITION2026-06-08影响: HIGH

Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation arXiv:2606.06712v1 Announce Type: new Abstract: We study the transformation of autoregressive models (ARLMs) into diffusion language models (DLMs). Rather than pretraining from scratch, prior work replaces the causal attention in ARLMs with bidirectional attention and then trains the resulting model using a DLM objective. However, these approaches incur two distribution shifts. First, transitioning from a next