TediGAN: Text-Guided Diverse Face Image Generation and Manipulation 论文

2021引用 342

Generative Adversarial Networks and Image SynthesisFace recognition and analysisMultimodal Machine Learning Applications

人工智能 Generative Adversarial Networks and Image Synthesis Multimodal Machine Learning Applications Face recognition and analysis

关系图谱

作者

摘要

In this work, we propose TediGAN, a novel framework for multi-modal image generation and manipulation with textual descriptions. The proposed method consists of three components: StyleGAN inversion module, visual-linguistic similarity learning, and instance-level optimization. The inversion module maps real images to the latent space of a well-trained StyleGAN. The visual-linguistic similarity learns the text-image matching by mapping the image and text into a common embedding space. The instancelevel optimization is for identity preservation in manipulation. Our model can produce diverse and high-quality images with an unprecedented resolution at 1024 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> . Using a control mechanism based on style-mixing, our TediGAN inherently supports image synthesis with multi-modal inputs, such as sketches or semantic labels, with or without instance guidance. To facilitate text-guided multi-modal synthesis, we propose the Multi-Modal CelebA-HQ, a large-scale dataset consisting of real face images and corresponding semantic segmentation map, sketch, and textual descriptions. Extensive experiments on the introduced dataset demonstrate the superior performance of our proposed method. Code and data are available at https://github.com/weihaox/TediGAN.

作者查看全部 (4)

Baoyuan Wu

Jing‐Hao Xue

Yujiu Yang

Weihao Xia

TediGAN: Text-Guided Diverse Face Image Generation and Manipulation 论文

详细信息

摘要

作者查看全部 (4)

相关技术查看全部 (3)

相关事件

相关文章