Alignment-Guided Score Matching for Text-to-Image Alignment in Diffusion Models 文章

ArXiv CS.CV2026-05-29NEWSen作者: Jaa-Yeon Lee, Yeobin Hong, Taesung Kwon, Jong Chul Ye

摘要

arXiv:2605.30038v1 Announce Type: cross Abstract: Diffusion models generate highly realistic images but often struggle with precise text-image alignment. While recent post-training methods improve alignment using external rewards or human preference signals, their performance heavily depends on reward quality and does not directly address alignment within the diffusion process itself. Recent reward-free approaches such as SoftREPA demonstrate that optimizing soft text tokens via contrastive learning can effectively improve text-image representation alignment, outperforming standard parameter-efficient fine-tuning baselines. However, the contrastive formulation can excessively penalize negative pairs, which manifests as characteristic failure cases such as over-counting and repetition.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据