Don't Guess, Just Ask: Resolving Ambiguity in Referring Segmentation via Multi-turn Clarification 文章

ArXiv CS.CV2026-05-26NEWSen作者: Yuting Yang, Haichao Jiang, Tianming Liang, Quan Zhang, Jian-Fang Hu

摘要

arXiv:2605.17531v2 Announce Type: replace Abstract: Referring segmentation aims to segment the target objects in images or videos based on the textual query. Despite remarkable progress over the past years, existing works always assume that the user-provided queries are already precise and clear. However, this assumption is impractical. In real-world scenarios, it is unrealistic to expect all users to thoroughly review their visual content and carefully ensure their queries are unique and unambiguous. When encountering such cases, existing segmentation models tend to arbitrarily guess the user preferences, often resulting in undesired outcomes. To address this limitation, we propose IC-Seg, a novel agentic framework that proactively clarifies user intent through multi-turn conversation before segmentation.