DRM: Diffusion-based Reward Model With Step-wise Guidance 文章

ArXiv CS.CV2026-05-26NEWSen作者: Jaxon Zhang, Binxin Yang, Hubery Yin, Chen Li, Jing Lyu

摘要

arXiv:2605.25661v1 Announce Type: new Abstract: Current mainstream methods of aligning diffusion models with human preferences typically employ VLM-based reward models. However, these reward models, pre-trained for semantic alignment, struggle to capture the essential perceptual qualities-such as aesthetics, composition, and visual harmony. In this work, we argue that a model capable of high-fidelity generation must possess a profound understanding of these visual attributes. Based on this insight, we introduce the Diffusion-based Reward Model (DRM), a novel paradigm that use the pre-trained diffusion model as a powerful evaluative backbone. A key advantage of the DRM is its unique ability to assess not only the final image but also the noisy intermediate latents at any stage of the generative process. We leverage this step-wise evaluative capacity in two ways.

DRM: Diffusion-based Reward Model With Step-wise Guidance 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (3)

相关技术查看全部 (2)