Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation 文章

ArXiv CS.CV2026-05-26NEWSen作者: Aviral Chharia, Fernando De la Torre

摘要

arXiv:2605.25220v1 Announce Type: new Abstract: High-fidelity 3D Gaussian head avatar generation is critical for applications such as AR/VR, telepresence, and digital humans. Existing methods depend on multi-view datasets, 3D captures, or intermediate 2D view synthesis. In contrast, we learn both conditional and unconditional 3D head models from randomly sampled 2D images alone, without using multi-view data, 3D supervision, or intermediate view generation. We introduce MVCHead, a single-shot state space model that enforces multi-view consistency (MVC) directly in the 3D representation while regressing 3D Gaussians under these constraints. At its core, we propose a Hierarchical State Space (HiSS) block that progressively refines Gaussians from coarse to fine, while capturing long-range dependencies.