RichControl: Structure- and Appearance-Rich Training-Free Spatial Control for Text-to-Image Generation 文章

ArXiv CS.CV2026-06-02NEWSen作者: Lexi Pang, Liheng Zhang, Hang Ye, Xiaoxuan Ma, Yizhou Wang

摘要

arXiv:2507.02792v5 Announce Type: replace Abstract: Text-to-image (T2I) diffusion models have shown remarkable success in generating high-quality images from text prompts. Recent efforts extend these models to incorporate conditional images (e.g., canny edge) for fine-grained spatial control. Among them, feature injection methods have emerged as a training-free alternative to traditional fine-tuning-based approaches. However, they often suffer from structural misalignment, condition leakage, and visual artifacts, especially when the condition image diverges significantly from natural RGB distributions. Through an analysis of existing methods, we identify a key limitation: the sampling schedule of condition features, previously unexplored, fails to account for the evolving interplay between structure preservation and domain alignment throughout diffusion steps.

RichControl: Structure- and Appearance-Rich Training-Free Spatial Control for Text-to-Image Generation 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术