DRM: Diffusion-based Reward Model With Step-wise Guidance 事件

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

DRM: Diffusion-based Reward Model With Step-wise Guidance arXiv:2605.25661v1 Announce Type: new Abstract: Current mainstream methods of aligning diffusion models with human preferences typically employ VLM-based reward models. However, these reward models, pre-trained for semantic alignment, struggle to capture the essential perceptual qualities-such as aesthetics, composition, and visual harmony. In this work, we argue that a model capable of high-fidelity generation must possess a profound un