Conformer: Local Features Coupling Global Representations for Visual Recognition 论文

20212021 IEEE/CVF International Conference on Computer Vision (ICCV)引用 774

Advanced Neural Network ApplicationsDomain Adaptation and Few-Shot LearningMultimodal Machine Learning Applications

人工智能 Advanced Neural Network Applications Domain Adaptation and Few-Shot Learning Multimodal Machine Learning Applications

关系图谱

作者

摘要

Within Convolutional Neural Network (CNN), the convolution operations are good at extracting local features but experience difficulty to capture global representations. Within visual transformer, the cascaded self-attention modules can capture long-distance feature dependencies but unfortunately deteriorate local feature details. In this paper, we propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning. Conformer roots in the Feature Coupling Unit (FCU), which fuses local features and global representations under different resolutions in an interactive fashion. Conformer adopts a concurrent structure so that local features and global representations are retained to the maximum extent. Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet. On MSCOCO, it outperforms ResNet-101 by 3.7% and 3.6% mAPs for object detection and instance segmentation, respectively, demonstrating the great potential to be a general backbone network. Code is available at github.com/pengzhiliang/Conformer.

作者查看全部 (6)

Qixiang Ye

Jianbin Jiao

Yaowei Wang

Lingxi Xie

Conformer: Local Features Coupling Global Representations for Visual Recognition 论文

详细信息

摘要

作者查看全部 (6)

相关技术查看全部 (3)

相关事件

相关文章