Vanilla ViT for Automotive Point Cloud Semantic Segmentation 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

Vanilla ViT for Automotive Point Cloud Semantic Segmentation arXiv:2605.31177v1 Announce Type: new Abstract: Plain Transformers have become the de-facto architecture for processing text, audio, image, and video, offering a unified backbone for multimodal learning. However, state-of-the-art architectures for point cloud semantic segmentation remain dominated by U-Nets architectures where convolutions are interleaved with local or windowed attentions. In this work, we show how to effectively leve