Spatial-aware Vision Language Model for Autonomous Driving 文章

ArXiv CS.CV2026-05-26NEWSen作者: Weijie Wei, Zhipeng Luo, Ling Feng, Venice Erin Liong

摘要

arXiv:2512.24331v2 Announce Type: replace Abstract: While Vision-Language Models (VLMs) show significant promise for end-to-end autonomous driving by leveraging the common sense embedded in language models, their reliance on 2D image cues for complex scene understanding and decision-making presents a critical bottleneck for safety and reliability. Current image-based methods struggle with accurate metric spatial reasoning and geometric inference, leading to unreliable driving policies. To bridge this gap, we propose LVLDrive (LiDAR-Vision-Language), a novel framework specifically designed to upgrade existing VLMs with robust 3D metric spatial understanding for autonomous driving by incoperating LiDAR point cloud as an extra input modality. A key challenge lies in mitigating the catastrophic disturbance introduced by disparate 3D data to the pre-trained VLMs.

Spatial-aware Vision Language Model for Autonomous Driving 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (4)

相关人物

相关产品查看全部 (10)

相关技术查看全部 (25)