Bridging the 2D-3D Gap: A Hierarchical Semantic-Geometric Map for Vision Language Navigation 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

Bridging the 2D-3D Gap: A Hierarchical Semantic-Geometric Map for Vision Language Navigation arXiv:2606.00095v1 Announce Type: new Abstract: Vision-Language Navigation (VLN) enables embodied agents to reach target locations in unseen environments by following language instructions. Despite recent progress with vision-language models (VLMs), a critical semantic-geometric gap remains: while VLMs excel at language and 2D visual understanding, they struggle with 3D spatial reasoning and fail to cap

Bridging the 2D-3D Gap: A Hierarchical Semantic-Geometric Map for Vision Language Navigation · 相关报道