3ViewSense: Spatial and Mental Perspective Reasoning from Orthographic Views in Vision-Language Models 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

3ViewSense: Spatial and Mental Perspective Reasoning from Orthographic Views in Vision-Language Models arXiv:2603.07751v2 Announce Type: replace Abstract: Current Large Language Models have achieved Olympiad-level logic, yet Vision-Language Models paradoxically falter on elementary spatial tasks like block counting. This capability mismatch reveals a critical ``spatial intelligence gap,'' where models fail to construct coherent 3D mental representations from 2D observations. We uncover this gap

3ViewSense: Spatial and Mental Perspective Reasoning from Orthographic Views in Vision-Language Models · 相关人物