DLLM-JEPA: Joint Embedding Predictive Architectures for Masked Diffusion Language Models 文章

ArXiv CS.CL2026-06-02NEWSen作者: Sangdae Nam

摘要

arXiv:2606.00091v1 Announce Type: new Abstract: Joint Embedding Predictive Architectures (JEPAs) have reshaped self-supervised representation learning in vision. The recent LLM-JEPA ported JEPA to autoregressive language models but inherited two steep costs from the causal-attention substrate: it demands explicit multi-view data (e.g., text-code pairs), and it requires two gradient-carrying forward passes per step. We introduce DLLM-JEPA, which pairs JEPA with masked-diffusion language models to eliminate both costs at once. The bidirectional attention of diffusion models yields two semantically distinct views of the same input via different masking rates -- no explicit pairs needed -- and supports a single gradient-carrying forward pass, cutting training FLOPs by 33% relative to LLM-JEPA. DLLM-JEPA improves over diffusion-only fine-tuning in every (task, architecture) combination we evaluate: up to +18.7 pp on LLaDA-8B GSM8K and +11.

DLLM-JEPA: Joint Embedding Predictive Architectures for Masked Diffusion Language Models 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (7)

相关技术查看全部 (5)