Does Visual Information Play a Decisive Role in Vision-Language-Action Model Driving Behavior? 文章

ArXiv CS.CV2026-06-01NEWSen作者: Jingtao He, Hongliang Lu, Xiaoyun Qiu, Yixuan Wang, Xinhu Zheng

详细信息

来源站点: ArXiv CS.CV
作者: Jingtao He, Hongliang Lu, Xiaoyun Qiu, Yixuan Wang, Xinhu Zheng
文章类型: NEWS
语言: en
发布日期: 2026-06-01

摘要

arXiv:2605.31041v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models have demonstrated promising capability in autonomous driving, highlighting the potential of unified multimodal architectures for jointly modeling perception and planning. However, how current VLA-based driving behavior is grounded in visual information remains poorly understood. Existing evaluation protocols mainly focus on aggregate performance metrics, lacking structured and practical diagnostics to quantify visual-behavior dependency. In this work, we introduce a structured multi-level visual perturbation framework to analyze visual-behavior dependency in VLA-based driving models systematically. The framework organizes controlled visual perturbations along three complementary dimensions: channellevel degradation, information-level disruption, and structurelevel modification.

Does Visual Information Play a Decisive Role in Vision-Language-Action Model Driving Behavior? 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (1)