Beyond Pairwise Preferences: Listwise Reward-Aware Alignment for Diffusion Models 事件
PRODUCT_LAUNCH2026-05-27影响: MEDIUM
Beyond Pairwise Preferences: Listwise Reward-Aware Alignment for Diffusion Models arXiv:2605.26491v1 Announce Type: cross Abstract: Preference optimization has emerged as an efficient alternative to online reinforcement learning from human feedback (RLHF) for aligning text-to-image diffusion models. However, existing methods largely reduce supervision to binary pairwise comparisons. This pairwise reduction is limiting when training data naturally contains multiple candidate images for the same
Beyond Pairwise Preferences: Listwise Reward-Aware Alignment for Diffusion Models · 相关报道
相关报道
Beyond Pairwise Preferences: Listwise Reward-Aware Alignment for Diffusion Models
ArXiv CS.CV2026-05-27