Reconsidering Positional Supervision in Masked Diffusion Language Model Training 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

Reconsidering Positional Supervision in Masked Diffusion Language Model Training arXiv:2601.22947v2 Announce Type: replace Abstract: Masked diffusion language models (MDLMs) generate text by unmasking tokens in parallel and have recently emerged as alternatives to autoregressive language models. They can be viewed as parallel decoders trained with a position-wise cross-entropy (CE) loss, the same setup as non-autoregressive translation (NAT). In NAT, CE-trained parallel decoders have been argue