详细信息
- 来源站点
- ArXiv CS.CV
- 作者
- Sixiao Zheng, Zimian Peng, Yanpeng Zhou, Yi Zhu, Hang Xu, Xiangru Huang, Yanwei Fu
- 文章类型
- NEWS
- 语言
- en
- 发布日期
- 2026-06-18
摘要
arXiv:2502.07531v5 Announce Type: replace Abstract: Controllable image-to-video (I2V) generation transforms a reference image into a coherent video guided by user-specified control signals. While precise control over camera motion, object motion, and lighting is essential for high-fidelity creation, existing methods often treat these factors independently. This overlooks the physical coupling among viewpoint, geometry, and illumination in dynamic scenes, leading to visual inconsistencies such as mismatched shadows and perspective drift under simultaneous changes. We present VidCRAFT3, a unified and flexible I2V framework that explicitly models cross-factor interactions among geometry, motion, and illumination, enabling both independent and joint control over camera motion, object motion, and lighting direction. Image2Cloud provides explicit 3D geometric priors for accurate camera motion control.