Limits of Spatial Imagery Reasoning in Frontier LLM Models 文章

ArXiv CS.CV2026-06-02NEWSen作者: Sergio Y. Hayashi, Nina S. T. Hirata

摘要

arXiv:2603.26779v2 Announce Type: replace Abstract: Large Language Models (LLMs) have demonstrated impressive reasoning capabilities, yet they struggle with spatial tasks that require mental simulation, such as mental rotation. This paper investigates whether equipping an LLM with an external ``Imagery Module'' -- a tool capable of rendering and rotating 3D models -- can bridge this gap, functioning as a ``cognitive prosthetic.'' We conducted experiments using a dual-module architecture in which a reasoning module (an MLLM) interacts with an imagery module on 3D model rotation tasks. Performance was lower than expected, with accuracy reaching at most 62.5%. Further investigation suggests that even when the burden of maintaining and manipulating a holistic 3D state is outsourced, the system still fails. This reveals that current frontier models lack the foundational visual-spatial primitives required to interface with imagery.

相关事件查看全部 (1)

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据

相关技术

暂无数据