摘要
arXiv:2410.12673v3 Announce Type: replace Abstract: Accurate 3D object detection in autonomous driving relies on Bird's Eye View (BEV) perception and effective temporal fusion. However, existing fusion strategies based on convolutional layers or deformable self-attention struggle to model global context in BEV space, leading to reduced accuracy for large objects.To address this limitation, we propose MambaBEV, a novel BEV-based 3D object detection model that leverages Mamba2, an advanced state-space model (SSM) optimized for long-sequence processing. Our key contribution is TemporalMamba, a temporal fusion module that enhances global context modeling through a BEV feature discrete rearrangement mechanism tailored for sequential processing. In addition, we introduce a Mamba-based DETR head to improve multi-object representation. Evaluations on the nuScenes dataset demonstrate that MambaBEV-base achieves 51.7% NDS and an 42.7% mAP.
相关事件查看全部 (1)
相关人物
暂无数据