AD-H: Language-guided Autonomous Driving with Hierarchical Agents 文章

ArXiv CS.CV2026-05-27NEWSen作者: Zaibin Zhang, Talas Fu, Shiyu Tang, Yuanhang Zhang, Yifan Wang, Lijun Wang, Huchuan Lu

摘要

arXiv:2406.03474v2 Announce Type: replace Abstract: Language-guided autonomous driving requires bridging a large abstraction gap between high-level natural-language instructions and low-level vehicle control. End-to-end approaches that use a single multimodal large language model (MLLM) to map language directly to actions struggle with this mismatch, often failing to exploit the reasoning capabilities of the model and exhibiting limited generalization beyond the distributions of driving datasets used for fine-tuning. To address this issue, we propose AD-H, a hierarchical multi-agent framework that explicitly separates high-level decision-making from low-level vehicle execution. At the upper level, an MLLM-based planner interprets natural-language commands and environmental context to generate coherent mid-level driving instructions. At the lower level, a lightweight controller converts these mid-level instructions into precise, continuous control actions.