摘要
arXiv:2605.25334v1 Announce Type: new Abstract: Spatial understanding of the physical world from 2D visual inputs hinges on two complementary forms of geometric knowledge: holistic 3D structural perception and fine-grained metric scale estimation. Existing multimodal large language models (MLLMs) typically address only one facet, ingesting either depth maps or point clouds as additional model inputs, which incurs substantial computational overhead and inherits the generalization limitations of upstream prediction models. We propose GAMSI, a dual-pathway Geometry-Aware MLLM for Spatial Intelligence that takes only RGB images as input while internalizing both forms of geometric prior within a unified autoregressive backbone.
相关事件查看全部 (1)
Dual-Pathway Geometry-Aware MLLM for Spatial Intelligence
2026-05-26PRODUCT_LAUNCH影响: MEDIUM
相关公司
暂无数据
相关人物
暂无数据