RealMath-Eval: Why SOTA Judges Struggle with Real Human Reasoning 事件

Name: RealMath-Eval: Why SOTA Judges Struggle with Real Human Reasoning
Start: 2026-06-10

BREAKTHROUGH2026-06-10影响: HIGH

RealMath-Eval: Why SOTA Judges Struggle with Real Human Reasoning arXiv:2606.10254v1 Announce Type: cross Abstract: While Large Language Models (LLMs) have achieved near-perfect performance in \emph{solving} high-school mathematics, their ability to \emph{evaluate} the diverse reasoning processes of real human students remains under-examined. To bridge this gap, we introduce \textbf{RealMath-Eval}, a rigorously annotated benchmark of 224 real-world exam responses from high schools. Our initial

人工智能

关系图谱

RealMath-Eval: Why SOTA Judges Struggle with Real Human Reasoning 事件

相关公司查看全部 (10)

相关人物查看全部 (2)

相关产品查看全部 (10)

相关技术查看全部 (10)

相关报道查看全部 (1)