Mamba-Enhanced Implicit Motion Learning for Audio-Driven Portrait Animation 文章

ArXiv CS.CV2026-06-03NEWSen作者: Xuan Wei, Jiahui Chen, Kaiheng Li, Mingyu Shao, Qingqi Hong

摘要

arXiv:2606.03402v1 Announce Type: new Abstract: Audio-driven human motion video generation aims to synthesize realistic and temporally coherent human animations from a single static image, with applications in talking-head synthesis, co-speech gesture generation, and dynamic presentations. Moving beyond conventional keypoint-based methods that often struggle to capture subtle motion dynamics, We propose a novel implicit-motion framework for generating realistic and temporally coherent human motion videos from a single static image and audio. Our approach uses a two-stage pipeline that decouples motion prediction from rendering. The first stage integrates appearance priors and hierarchical depth cues into a region-aware attention mechanism to model latent motion features. The second stage employs a Mamba-enhanced diffusion model to directly predict these features from audio and the source image, enabling unsupervised learning of fine-grained motion patterns.

Mamba-Enhanced Implicit Motion Learning for Audio-Driven Portrait Animation 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (3)