A Comprehensive Anatomy of Human and DeepSeek-R1 LLM Mathematical Reasoning 文章

ArXiv CS.AI2026-06-08NEWSen作者: Yuxiang Chen, Jun Wang

摘要

arXiv:2606.07410v1 Announce Type: cross Abstract: The emergence of "Aha moments" in large language models, particularly DeepSeek-R1-0120, has raised the question of whether these systems genuinely reason or merely imitate the appearance of reasoning. We conduct a comprehensive empirical comparison between model and human reasoning across all 30 problems from AIME 2025, exhaustively annotating 10,247 reasoning steps into five functional categories: Analysis, Inference, Branch, Backtrace, and Reflection. We find a clear structural difference. Human solutions maintain a compact alternation between analysis and deduction, whereas DeepSeek-R1 frequently revisits intermediate results, performs shallow and often unnecessary verification, and loops through local checks without meaningful logical progress. We describe this as topological mimicry: reproducing the surface form of reasoning without its functional role. Despite this, we identify two signals of genuine reasoning.

相关公司

暂无数据

相关人物

暂无数据

相关技术

暂无数据