Exploiting Verification-Generation Gap: Test-Time Reinforcement Learning with Confidence-Conditioned Verification 文章

ArXiv CS.AI2026-06-03NEWSen作者: Jiahui Li, Jianfeng Shan, Wenpei Chen, Shunyu Wu, Jian Lou, Wenjie Feng, Dan Li, See-Kiong Ng

Exploiting Verification-Generation Gap: Test-Time Reinforcement Learning with Confidence-Conditioned Verification · 相关人物

暂无数据