How and What to Imagine? Visual Thinking in Unified Multimodal Models for Cross-View Spatial Reasoning 事件
PRODUCT_LAUNCH2026-05-27影响: MEDIUM
How and What to Imagine? Visual Thinking in Unified Multimodal Models for Cross-View Spatial Reasoning arXiv:2605.27310v1 Announce Type: new Abstract: Cross-view spatial reasoning remains a weak spot for vision-language models (VLMs): they often reason in language and lose the fine-grained geometry needed for the task. Thinking with images aims to address this by generating an intermediate thinking image, but recent work shows that models often ignore the visual evidence in these traces. We the
相关人物
暂无数据