PillarDETR: YOLO-Backbone and RT-DETR Head for Real-Time 3D Object Detection 文章

ArXiv CS.CV2026-06-02NEWSen作者: Smit Kadvani, Shriya Gumber, Kriti Faujdar, Harsh Dave

摘要

arXiv:2606.01757v1 Announce Type: new Abstract: Real-time 3D object detection is a critical component for the safe operation of autonomous driving systems and robotics. While LiDAR point clouds provide accurate spatial information, processing them efficiently remains a significant challenge. Traditional methods rely on complex 3D convolutions or anchor-based paradigms that struggle to balance detection accuracy with inference speed. In this paper, we propose PillarDETR, a novel end-to-end 3D object detection architecture that combines the efficiency of pillar-based LiDAR encoding with the representational power of modern 2D vision models. Specifically, PillarDETR replaces standard convolutional backbones with a Cross Stage Partial (CSP) network derived from YOLOv8, enabling richer feature extraction from pseudoimages. Furthermore, we discard conventional anchor-based or center-based detection heads in favor of a Real-Time Detection Transformer (RT-DETR) decoder.