Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling 文章

ArXiv CS.CV2026-06-02NEWSen作者: Xiaolong Tang, Meina Kan, Shiguang Shan, Xilin Chen

摘要

arXiv:2505.17659v4 Announce Type: replace-cross Abstract: Safe and feasible trajectory planning is critical for real-world autonomous driving systems. However, existing learning-based planners rely heavily on expert demonstrations, which not only lack explicit safety awareness but also risk inheriting undesirable behaviors such as speeding from suboptimal human driving data. Inspired by the success of large language models, we propose Plan-R1, a two-stage trajectory planning framework that decouples principle alignment from behavior learning. In the first stage, a general trajectory predictor is pre-trained on expert data to capture diverse, human-like driving behaviors. In the second stage, the model is fine-tuned with rule-based rewards using Group Relative Policy Optimization (GRPO), explicitly aligning ego planning with principles such as safety, comfort, and traffic rule compliance.