On the Optimal Reasoning Length for RL-Trained Language Models 事件

PRODUCT_LAUNCH2026-06-11影响: MEDIUM

On the Optimal Reasoning Length for RL-Trained Language Models arXiv:2602.09591v3 Announce Type: replace Abstract: Reinforcement learning substantially improves reasoning in large language models, but it also tends to lengthen chain-of-thought outputs and increase computational cost. Although length-control methods have been proposed, the length-accuracy relationship they induce remains unclear. We train policies with several length-control methods on multiple base models in a controlled setup

On the Optimal Reasoning Length for RL-Trained Language Models · 相关报道