Grounded 3D-Aware Spatial Vision-Language Modeling 事件
PRODUCT_LAUNCH2026-05-29影响: MEDIUM
Grounded 3D-Aware Spatial Vision-Language Modeling arXiv:2605.30307v1 Announce Type: new Abstract: We present GR3D, a spatial vision language model equipped with three complementary grounding capabilities--explicit 2D grounding, implicit 2D grounding, and monocular 3D grounding--within a single framework. GR3D introduces an implicit grounding mechanism that identifies entity mentions during generation and inserts the corresponding region tokens into the text stream, allowing the model to refere