M2H-MX: Multi-Task Semantic and Geometric Perception for Real-Time Monocular 3D Scene Graph Construction 文章

ArXiv CS.CV2026-05-26NEWSen作者: U. V. B. L. Udugama, George Vosselman, Francesco Nex

详细信息

来源站点: ArXiv CS.CV
作者: U. V. B. L. Udugama, George Vosselman, Francesco Nex
文章类型: NEWS
语言: en
发布日期: 2026-05-26

摘要

arXiv:2603.29236v2 Announce Type: replace Abstract: Monocular cameras are attractive for robotic perception due to their low cost and ease of deployment, yet achieving reliable real-time spatial understanding from a single image stream remains challenging. While recent multi-task dense prediction models have improved per-pixel depth and semantic estimation, translating these advances into stable monocular mapping systems is still non-trivial. This paper presents M2H-MX, a real-time multi-task perception model for monocular spatial understanding. The model preserves multi-scale feature representations while introducing register-gated global context and controlled cross-task interaction in a lightweight decoder, enabling depth and semantic predictions to reinforce each other under strict latency constraints. Its outputs integrate directly into an unmodified monocular SLAM pipeline through a compact perception-to-mapping interface.

M2H-MX: Multi-Task Semantic and Geometric Perception for Real-Time Monocular 3D Scene Graph Construction 文章

详细信息

摘要

相关事件

相关公司查看全部 (4)

相关人物

相关产品查看全部 (5)

相关技术查看全部 (27)