Can LLMs Reason Structurally? Benchmarking via the Lens of Data Structures 文章

ArXiv CS.AI2026-06-02NEWSen作者: Yu He, Yingxi Li, Colin White, Ellen Vitercik

摘要

arXiv:2505.24069v4 Announce Type: replace-cross Abstract: Large language models (LLMs) are deployed on increasingly complex tasks that require multi-step decision-making. Understanding their algorithmic reasoning abilities is therefore crucial. However, we lack a diagnostic benchmark for evaluating these capabilities. We propose to use data structures as a principled lens: as fundamental building blocks of algorithms, they naturally probe structural reasoning - the ability to understand and manipulate relationships such as order, hierarchy, and connectivity that underpin algorithmic reasoning. We introduce DSR-Bench (Data Structure Reasoning Benchmark), spanning 20 data structures, 35 operations, and 4,140 problem instances. DSR-Bench features hierarchical task organization, fully automated generation and evaluation, and fine-grained diagnostics. Evaluating 13 state-of-the-art LLMs reveals critical limitations: the top-performing model achieves only 0.46/1 on challenging instances.

相关公司

暂无数据

相关人物

暂无数据

相关技术

暂无数据