Zero-Shot 3D Question Answering via Hierarchical View-to-Token Transportation 事件

PRODUCT_LAUNCH2026-06-03影响: MEDIUM

Zero-Shot 3D Question Answering via Hierarchical View-to-Token Transportation arXiv:2606.03100v1 Announce Type: new Abstract: Recently, zero-shot 3D scene understanding via 2D Vision-Language Models (VLMs) has gained increasing research interest due to their promising spatial reasoning capabilities. Typically, multiple 2D views are sampled from a 3D point cloud and fed into pre-trained VLMs to answer a given question. This paradigm highlights the critical role of input context quality and raise