SpaCeFormer: Fast Proposal-Free Open-Vocabulary 3D Instance Segmentation 文章

ArXiv CS.CV2026-06-01NEWSen作者: Chris Choy, Junha Lee, Chunghyun Park, Minsu Cho, Jan Kautz

摘要

arXiv:2604.20395v2 Announce Type: replace Abstract: Open-vocabulary 3D instance segmentation is a core capability for robotics and AR/VR, but prior methods trade one bottleneck for another: multi-stage 2D+3D pipelines aggregate foundation-model outputs at hundreds of seconds per scene, while pseudo-labeled end-to-end approaches rely on fragmented masks and external region proposals. We present SpaCeFormer, a proposal-free space-curve transformer that runs in 0.12--0.30 seconds per scene across standard benchmarks, 2--3 orders of magnitude faster than multi-stage 2D+3D pipelines. We pair it with SpaCeFormer-3M, the largest open-vocabulary 3D instance segmentation dataset (3.0M multi-view-consistent captions over 604K instances from 7.4K scenes) built through multi-view mask clustering and multi-view VLM captioning; it reaches 21$\times$ higher mask recall than prior single-view pipelines (54.3% vs 2.5% at IoU$>$0.5).

相关公司

暂无数据

相关人物

暂无数据