Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation 文章

ArXiv CS.CV2026-05-26NEWSen作者: Zihan Su, Hongyang Wei, Kangrui Cen, Yong Wang, Guanhua Chen, Chun Yuan, Xiangxiang Chu

详细信息

来源站点: ArXiv CS.CV
作者: Zihan Su, Hongyang Wei, Kangrui Cen, Yong Wang, Guanhua Chen, Chun Yuan, Xiangxiang Chu
文章类型: NEWS
语言: en
发布日期: 2026-05-26

摘要

arXiv:2601.21406v3 Announce Type: replace Abstract: Unified Multimodal Models (UMMs) integrate both visual understanding and generation within a single framework. Their ultimate aspiration is to create a cycle where understanding and generation mutually reinforce each other. While recent post-training methods have successfully leveraged understanding to enhance generation, the reverse direction of utilizing generation to improve understanding remains largely unexplored. In this work, we propose UniMRG (Unified Multi-Representation Generation), a simple yet effective architecture-agnostic post-training method. UniMRG enhances the understanding capabilities of UMMs by incorporating auxiliary generation tasks. Specifically, we train UMMs to generate multiple intrinsic representations of input images, namely pixel (reconstruction), depth (geometry), and segmentation (structure), alongside standard visual understanding objectives.

Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation 文章

详细信息

摘要

相关事件

相关公司查看全部 (4)

相关人物

相关产品查看全部 (11)

相关技术查看全部 (20)