Joint 2D-3D Segmentation and Association in Street-level Imaging 文章

ArXiv CS.CV2026-05-27NEWSen作者: Amir Melnikov, Masayuki Tanaka, Yusuke Monno, Masatoshi Okutomi

摘要

arXiv:2605.26725v1 Announce Type: new Abstract: Accurate interpretation of street-level imagery is essential for large-scale urban mapping and the creation of Spatial Digital Twin (SDT) environments. This work presents a unified framework for joint 2D-3D segmentation and association that integrates visual semantics with multi-view geometric reasoning. Unlike conventional approaches that rely heavily on sequential frames for temporal tracking, our method leverages zero-shot detection and segmentation together with structure-from-motion reconstruction to establish stable cross-view correspondences. A 3D-driven association mechanism replaces traditional 2D multi-object tracking, using geometric consistency to guide identity preservation across wide-baseline viewpoints and varying imaging conditions. By combining 2D texture cues with global 3D context, the proposed pipeline is well-suited for scalable street-level processing and can be used for a variety of object types.