VLM3: Vision Language Models Are Native 3D Learners 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

VLM3: Vision Language Models Are Native 3D Learners arXiv:2605.30561v1 Announce Type: new Abstract: Vision Language Models (VLMs) enable a unified model to solve various vision tasks through prompting. They have shown promising performance in semantic understanding. However, 3D understanding still largely relies on expert vision models with complex task-specific designs. The key argument this work wants to make is that VLMs are native 3D learners. Our in-depth large scale study shows that 1) fo