FloorplanQA: A Benchmark for Spatial Reasoning in LLMs using Structured Representations 文章

ArXiv CS.AI2026-05-26NEWSen作者: Fedor Rodionov, Abdelrahman Eldesokey, Michael Birsak, John Femiani, Bernard Ghanem, Peter Wonka

详细信息

来源站点: ArXiv CS.AI
作者: Fedor Rodionov, Abdelrahman Eldesokey, Michael Birsak, John Femiani, Bernard Ghanem, Peter Wonka
文章类型: NEWS
语言: en
发布日期: 2026-05-26

原文

摘要

arXiv:2507.07644v4 Announce Type: replace Abstract: We introduce FloorplanQA, a diagnostic benchmark for evaluating spatial reasoning in large language models (LLMs). FloorplanQA is grounded in structured representations of indoor scenes, such as (e.g., kitchens, living rooms, bedrooms, bathrooms, and others), encoded symbolically in JSON or XML layouts. The benchmark covers core spatial tasks, including distance measurement, visibility, path finding, and object placement within constrained spaces. Our results across a variety of frontier open-source and commercial LLMs reveal that while models may succeed in shallow queries, they often fail to respect physical constraints, preserve spatial coherence, though they remain mostly robust to small spatial perturbations. FloorplanQA uncovers a blind spot in today's LLMs: inconsistent reasoning about indoor layouts.

FloorplanQA: A Benchmark for Spatial Reasoning in LLMs using Structured Representations 文章

详细信息

摘要

相关事件

相关公司查看全部 (3)

相关人物

相关产品查看全部 (6)

相关技术查看全部 (24)