When Self-Belief Misleads: Active Label Acquisition for Reinforcement Learning with Verifiable Rewards 文章

ArXiv CS.CL2026-05-26NEWSen作者: Li Wang, Xiaodong Lu, Xiaohan Wang, Yikun Ban, Jiajun Chai, Wei Lin, Tianhao Peng, Guojun Yin

查看原文 →

关系图谱

详细信息

来源站点: ArXiv CS.CL
作者: Li Wang, Xiaodong Lu, Xiaohan Wang, Yikun Ban, Jiajun Chai, Wei Lin, Tianhao Peng, Guojun Yin
文章类型: NEWS
语言: en
发布日期: 2026-05-26

原文

摘要

arXiv:2605.25864v1 Announce Type: cross Abstract: Large Language Models (LLMs) have achieved remarkable advancements in reasoning capabilities empowered by Reinforcement Learning with Verifiable Rewards (RLVR). Nonetheless, RLVR intrinsically relies on ground-truth labels for reward computation, the acquisition of which is often prohibitively expensive in real-world scenarios. While unsupervised RLVR paradigms attempt to circumvent this by training on pseudo-labels, they are notoriously susceptible to training collapse. Moreover, different samples often exhibit varying annotation values. In this paper, we propose Reinforcement Learning with Active Verifiable Rewards (RLAVR), which actively acquires ground-truth labels for a small set of selected samples and integrates them with pseudo-labels, thereby stabilizing training dynamics and improving performance under limited annotation budgets.

When Self-Belief Misleads: Active Label Acquisition for Reinforcement Learning with Verifiable Rewards 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (3)