Can Segmentation Models Understand the World? Towards Proactive Affordance Reasoning via Visual Chain-of-Thought 事件

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

Can Segmentation Models Understand the World? Towards Proactive Affordance Reasoning via Visual Chain-of-Thought arXiv:2605.27764v1 Announce Type: new Abstract: Recent segmentation models couple large language models (LLMs) with mask decoders to ground complex language expressions into masks, yet their instructions remain target-referential: they describe, constrain, or imply the region to be segmented. However, in real-world embodied interaction, human instructions are often at the intent-leve

Can Segmentation Models Understand the World? Towards Proactive Affordance Reasoning via Visual Chain-of-Thought · 相关技术