GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models 事件

PRODUCT_LAUNCH2026-05-29影响: MEDIUM

GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models arXiv:2605.29398v1 Announce Type: cross Abstract: Reinforcement learning (RL) can be used to improve the policy (denoiser) of diffusion large language models (dLLMs), while being hindered by the intractability of the policy likelihood. A dominant and efficient family of methods replaces the likelihood in standard RL with its evidence lower bound (ELBO), estimated from randomly masked sequences. Despi