Diff-Instruct with Diffused Reward: Towards Principled One-step Generator RL 文章

ArXiv CS.CV2026-05-27NEWSen作者: Junyi Wu, Weijian Luo, Haoyang Zheng, Ruizhe Zhang, Guang Lin

摘要

arXiv:2605.24001v2 Announce Type: replace Abstract: Recent advances in one-step text-to-image generation have enabled real-time synthesis with remarkable efficiency and quality. Previous reinforcement learning methods for one-step generators combine image-space reward optimization with diffusion noisy-space distribution matching. This paradigm brings challenges due to a mismatch between terminal reward optimization and the underlying generative dynamics. As a result, optimization tends to exploit stochastic degrees of freedom, often improving reward at the expense of image fidelity. To address this issue, we propose Diff-Instruct with Diffused Reward (DIDR), a data-free trajectory-level alignment framework derived from Integral KL minimization. DIDR propagates the RLHF-optimal reward-tilted clean-image distribution across all noise levels along the diffusion trajectory.

Diff-Instruct with Diffused Reward: Towards Principled One-step Generator RL 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (4)

相关人物

相关产品查看全部 (9)

相关技术查看全部 (23)