TRACE: Toulmin-based Reasoning Assessment through Constructive Elements for LLM CoT Evaluation 文章

ArXiv CS.AI2026-05-29NEWSen作者: Yundong Kim, Heyoung Yang

摘要

arXiv:2605.29656v1 Announce Type: new Abstract: Evaluating open-ended outputs from large language models (LLMs) remains challenging due to the absence of ground truth. Existing metrics rely on final-answer accuracy or surface-level statistics, leaving the reasoning process itself unexamined. We introduce TRACE (Toulmin-based Reasoning Assessment through Constructive Elements), a metric that analyzes Chain-of-Thought (CoT) reasoning processes. Rather than judging outcomes, TRACE inspects how arguments are constructed by integrating Toulmin's argumentation theory with Flavell's metacognitive framework to assess reasoning structure. Experiments on 26.3K QA samples across 7 reasoning models show strong correlation with benchmark accuracy (r=0.74). Furthermore, TRACE is effective as a reinforcement learning reward signal, outperforming accuracy-only baselines. Together, these results indicate that logically sound reasoning leads to higher-quality answers.

TRACE: Toulmin-based Reasoning Assessment through Constructive Elements for LLM CoT Evaluation 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (4)