OpenScene: 3D Scene Understanding with Open Vocabularies 论文

2023引用 290

3D Surveying and Cultural HeritageHuman Pose and Action RecognitionAdvanced Image and Video Retrieval Techniques

Advanced Image and Video Retrieval Techniques Human Pose and Action Recognition 3D Surveying and Cultural Heritage

作者

摘要

Traditional 3D scene understanding approaches rely on labeled 3D datasets to train a model for a single task with supervision. We propose OpenScene, an alternative approach where a model predicts dense features for 3D scene points that are co-embedded with text and image pixels in CLIP feature space. This zero-shot approach enables task-agnostic training and open-vocabulary queries. For example, to perform SOTA zero-shot 3D semantic segmentation it first infers CLIP features for every 3D point and later classifies them based on similarities to embeddings of arbitrary class labels. More interestingly, it enables a suite of open-vocabulary scene understanding applications that have never been done before. For example, it allows a user to enter an arbitrary text query and then see a heat map indicating which parts of a scene match. Our approach is effective at identifying objects, materials, affordances, activities, and room types in complex 3D scenes, all using a single model trained without any labeled 3D data.

作者查看全部 (6)

Thomas Funkhouser

Marc Pollefeys

Andrea Tagliasacchi

Chiyu Jiang

OpenScene: 3D Scene Understanding with Open Vocabularies 论文

摘要

作者查看全部 (6)

相关技术查看全部 (2)

相关事件

相关文章