Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models 事件
PRODUCT_LAUNCH2026-06-01影响: MEDIUM
Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models arXiv:2510.11683v3 Announce Type: replace-cross Abstract: A key challenge in applying reinforcement learning (RL) to diffusion large language models (dLLMs) is the intractability of their likelihood functions, which are essential for the RL objective, necessitating corresponding approximation during training. While existing methods approximate the log-likelihoods by their evidence lower bounds (ELBOs)