RS <sup>3</sup> Mamba: Visual State Space Model for Remote Sensing Image Semantic Segmentation 论文

2024IEEE Geoscience and Remote Sensing Letters引用 267
Remote-Sensing Image ClassificationAdvanced Image and Video Retrieval TechniquesImage Retrieval and Classification Techniques

摘要

Semantic segmentation of remote sensing images is a fundamental task in geoscience research. However, convolutional neural networks (CNNs) and transformers have some significant shortcomings. The former are limited by insufficient long-range modeling capabilities, while the latter are hampered by computational complexity. Recently, a novel visual state space (VSS) model represented by Mamba has emerged, capable of modeling long-range relationships with linear computability. In this research, we propose a novel dual-branch network named remote sensing image semantic segmentation Mamba (RS3Mamba) designed specifically for remote sensing tasks. RS3Mamba uses VSS blocks to construct an auxiliary branch, providing additional global information to a convolution-based main branch. Moreover, considering the distinct characteristics of the two branches, we introduce a collaborative completion module (CCM) to refine and fuse features from the dual-encoder using a novel adaptive mechanism. Through experiments on two widely used datasets, the proposed RS3Mamba was found to outperform the state-of-the-art methods in terms of mIoU with 0.66% on ISPRS Vaihingen and 1.70% on LoveDA Urban, demonstrating its effectiveness and potential. The source code is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/sstary/SSRS</uri>.