Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models 文章

ArXiv CS.CL2026-06-01NEWSen作者: Nianyi Lin, Jiajie Zhang, Lei Hou, Juanzi Li

详细信息

来源站点: ArXiv CS.CL
作者: Nianyi Lin, Jiajie Zhang, Lei Hou, Juanzi Li
文章类型: NEWS
语言: en
发布日期: 2026-06-01

摘要

arXiv:2510.11683v3 Announce Type: replace-cross Abstract: A key challenge in applying reinforcement learning (RL) to diffusion large language models (dLLMs) is the intractability of their likelihood functions, which are essential for the RL objective, necessitating corresponding approximation during training. While existing methods approximate the log-likelihoods by their evidence lower bounds (ELBOs) via customized Monte Carlo (MC) sampling, they incur significant memory overhead due to the need to retain all MC samples for the gradient computation of non-linear terms in the RL objective, and thus restrict feasible sample sizes, leading to imprecise likelihood approximations and distorted RL objective. To address this, we propose \emph{Boundary-Guided Policy Optimization} (BGPO), a memory-efficient RL algorithm that maximizes a specially constructed lower bound of the ELBO-based objective.

Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (1)