Reasmory: 3D Reconstruction as Explicit Memory for VLMs Spatial Reasoning 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

Reasmory: 3D Reconstruction as Explicit Memory for VLMs Spatial Reasoning arXiv:2606.00963v1 Announce Type: new Abstract: Vision-Language Models (VLMs) exhibit emerging spatial reasoning capabilities, yet they remain unreliable on tasks requiring precise spatial understanding, such as viewpoint reasoning, directional comparison, and distance estimation. In multi-view images and monocular videos, relevant spatial cues are often sparse and distributed across redundant observations, making them di