Beyond Pairwise Preferences: Listwise Reward-Aware Alignment for Diffusion Models 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

Beyond Pairwise Preferences: Listwise Reward-Aware Alignment for Diffusion Models arXiv:2605.26491v1 Announce Type: cross Abstract: Preference optimization has emerged as an efficient alternative to online reinforcement learning from human feedback (RLHF) for aligning text-to-image diffusion models. However, existing methods largely reduce supervision to binary pairwise comparisons. This pairwise reduction is limiting when training data naturally contains multiple candidate images for the same