Dual-Pathway Geometry-Aware MLLM for Spatial Intelligence 文章

ArXiv CS.CV2026-05-26NEWSen作者: Yufei Zheng, Xuhan Zhu, Zide Liu, Chunpeng Zhou, Chenfeng Wang, Yongchao Xu, Yunnan Wang, Jiawei Liu, Pengfei Yu, Wei Zhai, Yang Cao, Zheng-Jun Zha

摘要

arXiv:2605.25334v1 Announce Type: new Abstract: Spatial understanding of the physical world from 2D visual inputs hinges on two complementary forms of geometric knowledge: holistic 3D structural perception and fine-grained metric scale estimation. Existing multimodal large language models (MLLMs) typically address only one facet, ingesting either depth maps or point clouds as additional model inputs, which incurs substantial computational overhead and inherits the generalization limitations of upstream prediction models. We propose GAMSI, a dual-pathway Geometry-Aware MLLM for Spatial Intelligence that takes only RGB images as input while internalizing both forms of geometric prior within a unified autoregressive backbone.

相关事件查看全部 (1)

Dual-Pathway Geometry-Aware MLLM for Spatial Intelligence
2026-05-26PRODUCT_LAUNCH影响: MEDIUM

相关公司

暂无数据

相关人物

暂无数据