Detecting and Mitigating the Correct-Answer Extinction Window in Test-Time Reinforcement Learning with Majority Voting 文章

ArXiv CS.AI2026-05-28NEWSen作者: Hongxiang Lin, Zhirui Kuai, Erpeng Xue, Lei Wang

Detecting and Mitigating the Correct-Answer Extinction Window in Test-Time Reinforcement Learning with Majority Voting · 相关技术