Image-Conditioned Instance Prompt Network for Referring Remote Sensing Image Segmentation 文章

ArXiv CS.CV2026-05-26NEWSen作者: Biaoyu Ren (School of Computer Science, Northwestern Polytechnical University, Xi'an, China), Qingsheng Wang (School of Computer Science, Northwestern Polytechnical University, Xi'an, China), Cun Xu (School of Computer Science, Northwestern Polytechnical University, Xi'an, China), Dingkang Yang (College of Intelligent Robotics and Advanced Manufacturing, Fudan University, Shanghai, China), Wenxuan Wang (School of Computer Science, Northwestern Polytechnical University, Xi'an, China, Shenzhen Research Institute of Northwestern Polytechnical University, Shenzhen, China)

摘要

arXiv:2605.24532v1 Announce Type: new Abstract: Referring Remote Sensing Image Segmentation (RRSIS) is a situated, task-driven cross-modal task related to the embodied perception paradigm, requiring models to align visual-spatial features with linguistic intentions for precise target perception. Recent research has focused on refining the granularity of textual features and optimizing image-text feature fusion to better guide target feature representations. However, insufficient descriptive granularity and sensitivity to semantic shifts can cause bottlenecks in cross-modal feature fusion. To address these issues, we propose the Image-Conditioned Instance Prompt Network (ICIPNet) with Bilateral Information Fusion, which is designed to alleviate bottlenecks in cross-modal feature fusion. ICIPNet introduces an Image-Conditioned Instance Prompt (ICIP) module to generate self-adaptive visual and semantic representations without external knowledge.