Beyond Pairwise Preferences: Listwise Reward-Aware Alignment for Diffusion Models 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

Beyond Pairwise Preferences: Listwise Reward-Aware Alignment for Diffusion Models arXiv:2605.26491v1 Announce Type: cross Abstract: Preference optimization has emerged as an efficient alternative to online reinforcement learning from human feedback (RLHF) for aligning text-to-image diffusion models. However, existing methods largely reduce supervision to binary pairwise comparisons. This pairwise reduction is limiting when training data naturally contains multiple candidate images for the same

Beyond Pairwise Preferences: Listwise Reward-Aware Alignment for Diffusion Models · 相关技术