Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators 文章

ArXiv CS.CV2026-06-05NEWSen作者: Chenming Zhu, Jingli Lin, Yilin Long, Peizhou Cao, Tai Wang, Jiangmiao Pang, Xihui Liu

摘要

arXiv:2606.06476v1 Announce Type: new Abstract: While Vision-Language Models (VLMs) have shown strong visual reasoning capabilities, their spatial reasoning abilities remain largely constrained to the observed images and text-oriented chain-of-thought. They often struggle to infer unobserved layouts, maintain cross-view consistency, and reason from alternative viewpoints when only limited egocentric observations are available. In this work, we study this problem as thinking with imagination, where a VLM actively acquires imagined visual evidence by interacting with a world simulator during reasoning. We propose Astra, an agentic spatial reasoning framework that empowers VLMs with action-conditioned visual imagination. Specifically, Astra couples Astra-VL, an RL-trained VLM policy, with Astra-WM, a Bagel-based world simulator that generates novel-view observations from context images and natural-language camera motions.

Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators 文章

摘要

相关事件查看全部 (2)

相关公司

相关人物

相关产品查看全部 (9)

相关技术查看全部 (2)