Fusion Embedding for Pose-Guided Person Image Synthesis with Diffusion Model 文章

ArXiv CS.CV2026-05-26NEWSen作者: Donghwna Lee, Kirok Kim, Jisu Lee, Kyungha Min, Wooju Kim

摘要

arXiv:2412.07333v2 Announce Type: replace Abstract: Pose-Guided Person Image Synthesis (PGPIS) aims to generate human images in specified poses while preserving the identity and appearance of a source image. This technology facilitates diverse applications, including virtual try-on, digital avatars, animation, and sign language generation. Despite the high-quality results of recent diffusion-based PGPIS, these models typically depend on implicit feature aggregation within the denoising process. As a result, fine-grained texture preservation is limited, and even for the same identity, it is difficult to ensure consistent generation under variations in pose and source appearance. To address these limitations, we propose Fusion Embedding for PGPIS using a Diffusion Model (FPDM), the first framework that explicitly aligns fused source-pose embeddings with target image embeddings via contrastive learning, and subsequently employs the learned fusion embedding as a conditioning signal for…

摘要可能不完整,可查看原文