Characterizing, Evaluating, and Optimizing Complex Reasoning 文章

ArXiv CS.CL2026-06-04NEWSen作者: Haoran Zhang, Yafu Li, Zhi Wang, Zhilin Wang, Shunkai Zhang, Xiaoye Qu, Yu Cheng

摘要

arXiv:2602.08498v2 Announce Type: replace Abstract: Large Reasoning Models (LRMs) increasingly rely on reasoning traces with complex internal structures. However, existing work lacks a unified answer to three fundamental questions: (1) what defines high-quality reasoning, (2) how to reliably evaluate long, implicitly structured reasoning traces, and (3) how to use such evaluation signals for reasoning optimization. To address these challenges, we provide a unified perspective. (1) We introduce the ME$^2$ principle to characterize reasoning quality along macro- and micro-level concerning efficiency and effectiveness. (2) Built on this principle, we model reasoning traces as directed acyclic graphs (DAGs) and develop a DAG-based pairwise evaluation method, capturing complex reasoning structures. (3) Based on this method, we construct the TRM-Preference dataset and train a Thinking Reward Model (TRM) to evaluate reasoning quality at scale.

Characterizing, Evaluating, and Optimizing Complex Reasoning 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (8)

相关技术查看全部 (4)