InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning 文章

ArXiv CS.CL2026-06-05NEWSen作者: Chengwei Wei, Jung-jae Kim, Longyin Zhang, Shengkai Chen, Nancy F. Chen

详细信息

来源站点: ArXiv CS.CL
作者: Chengwei Wei, Jung-jae Kim, Longyin Zhang, Shengkai Chen, Nancy F. Chen
文章类型: NEWS
语言: en
发布日期: 2026-06-05

摘要

arXiv:2603.17310v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) with extended reasoning capabilities often generate verbose and redundant reasoning traces, incurring unnecessary computational cost. While existing reinforcement learning approaches address this by optimizing final response length, they neglect the quality of intermediate reasoning steps, leaving models vulnerable to reward hacking. We argue that verbosity is not merely a length problem, but a symptom of poor intermediate reasoning quality. To investigate this, we conduct an empirical study tracking the per-token predictive entropy of large reasoning models across reasoning trajectories. We find that high-quality reasoning traces exhibit two consistent properties: low uncertainty convergence and fast uncertainty descent.

InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (3)