ViewMask-1-to-3: Multi-View Consistent Image Generation via Multimodal Discrete Diffusion Models 文章

ArXiv CS.CV2026-06-04NEWSen作者: Ruishu Zhu, Zhihao Huang, Jiacheng Sun, Ping Luo, Hongyuan Zhang, Xuelong Li

详细信息

来源站点
ArXiv CS.CV
作者
Ruishu Zhu, Zhihao Huang, Jiacheng Sun, Ping Luo, Hongyuan Zhang, Xuelong Li
文章类型
NEWS
语言
en
发布日期
2026-06-04

摘要

arXiv:2512.14099v3 Announce Type: replace Abstract: Motivated by discrete diffusion's success in language-vision modeling, we explore its potential for multi-view generation, a task dominated by continuous approaches. We introduce ViewMask-1-to-3, formulating multi-view generation as a discrete sequence modeling problem where each viewpoint is represented as visual tokens from MAGVIT-v2. Through discrete diffusion via masked token prediction, our approach enables progressive multi-view generation via iterative token unmasking, unifying language and vision in a shared token space. Importantly, simple random masking combined with self-attention naturally encourages cross-view consistency without specialized architectures or 3D geometric priors. Our method outperforms the baseline on the GSO and 3D-FUTURE benchmarks, ranking first on average across standard image metrics, and achieving a 10.6% higher IoU than continuous diffusion models on 3D-FUTURE.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据