X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes 文章

ArXiv CS.AI2026-06-03NEWSen作者: Tianxi Gao, Yufan Cai, Yusi Yuan, Jin Song Dong

摘要

arXiv:2603.05290v2 Announce Type: replace Abstract: Large language models (LLMs) achieve promising performance, yet their ability to reason remains poorly understood. Existing evaluations largely emphasize task-level accuracy, often conflating pattern matching with reasoning capability. We present X-RAY, an explainable reasoning analysis system that maps the LLM reasoning capability using calibrated, formally verified probes. We model reasoning capability as a function of extractable \textit{structure}, operationalized through formal properties such as constraint interaction, reasoning depth, and solution-space geometry. X-Ray generates probes via formal tools with controlled structural variations, enabling precise isolation of incremental structural information through formal calibration and verification. We evaluate state-of-the-art LLMs on problems ranging from junior-level to advanced in mathematics, physics, and chemistry.

相关公司

暂无数据

相关人物

暂无数据

相关技术

暂无数据