HOLA: Holistic Multi-Modal Alignment for Open-Set 3D Recognition 文章

ArXiv CS.CV2026-06-02NEWSen作者: Koby Aharonov, Oren Shrout, Ayellet Tal

摘要

arXiv:2606.01334v1 Announce Type: new Abstract: Open-set 3D recognition requires models that generalize to rare or unseen categories. Recent approaches address this by distilling language-vision knowledge into 3D encoders, typically relying on heavy 2D ViTs and aligning each point cloud with a single image or caption, thus anchoring representations to partial views. We propose aligning each point cloud with multiple images and textual descriptions to capture a more holistic understanding of 3D objects. To realize this idea, it is essential to design a loss function capable of jointly aligning a 3D instance with multiple matched signals, multi-view images and multiple texts, while separating positive aggregation from negative competition. We introduce such a function, termed the decoupled multi-positive contrastive loss.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据