Improving Visual Grounding in Remote Sensing via Cluster-Guided Refinement and Model Ensemble Voting 文章

ArXiv CS.CV2026-06-02NEWSen作者: Panav Shah, Geet Sethi, Ashutosh Gandhe

摘要

arXiv:2606.00556v1 Announce Type: new Abstract: Visual grounding aims to locate image regions that correspond to natural language descriptions and is a key component of interpretable vision systems. In remote sensing imagery, grounding is particularly challenging due to complex scenes, small objects, and large variations in scale. Relying on a single model is often insufficient to address these diverse challenges. In this work, we propose two grounding pipelines, Sequential Grounding Refinement (SGR) and Cluster-Aware Grounding Refinement (CGR), that combine the complementary strengths of RemoteSAM, a visual grounding model specialized for remote sensing, and SAM3, a powerful general-purpose segmentation model. Our approach first uses RemoteSAM to obtain an initial estimate of object location, which is then refined using SAM3 to produce more accurate and spatially consistent segmentations.

Improving Visual Grounding in Remote Sensing via Cluster-Guided Refinement and Model Ensemble Voting 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (2)

相关技术查看全部 (2)