Does Visual Information Play a Decisive Role in Vision-Language-Action Model Driving Behavior? 文章

ArXiv CS.CV2026-06-01NEWSen作者: Jingtao He, Hongliang Lu, Xiaoyun Qiu, Yixuan Wang, Xinhu Zheng

摘要

arXiv:2605.31041v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models have demonstrated promising capability in autonomous driving, highlighting the potential of unified multimodal architectures for jointly modeling perception and planning. However, how current VLA-based driving behavior is grounded in visual information remains poorly understood. Existing evaluation protocols mainly focus on aggregate performance metrics, lacking structured and practical diagnostics to quantify visual-behavior dependency. In this work, we introduce a structured multi-level visual perturbation framework to analyze visual-behavior dependency in VLA-based driving models systematically. The framework organizes controlled visual perturbations along three complementary dimensions: channellevel degradation, information-level disruption, and structurelevel modification.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据