X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes 文章

ArXiv CS.AI2026-06-03NEWSen作者: Tianxi Gao, Yufan Cai, Yusi Yuan, Jin Song Dong

摘要

arXiv:2603.05290v2 Announce Type: replace Abstract: Large language models (LLMs) achieve promising performance, yet their ability to reason remains poorly understood. Existing evaluations largely emphasize task-level accuracy, often conflating pattern matching with reasoning capability. We present X-RAY, an explainable reasoning analysis system that maps the LLM reasoning capability using calibrated, formally verified probes. We model reasoning capability as a function of extractable \textit{structure}, operationalized through formal properties such as constraint interaction, reasoning depth, and solution-space geometry. X-Ray generates probes via formal tools with controlled structural variations, enabling precise isolation of incremental structural information through formal calibration and verification. We evaluate state-of-the-art LLMs on problems ranging from junior-level to advanced in mathematics, physics, and chemistry.

X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes 文章

摘要

相关事件查看全部 (2)

相关公司

相关人物

相关产品查看全部 (2)

相关技术