Discounted Beta-Bernoulli Reward Estimation for Sample-Efficient Reinforcement Learning with Verifiable Rewards 文章

ArXiv CS.AI2026-05-26NEWSen作者: Haechan Kim, Soohyun Ryu, Gyouk Chu, Doohyuk Jang, Eunho Yang

Discounted Beta-Bernoulli Reward Estimation for Sample-Efficient Reinforcement Learning with Verifiable Rewards · 相关技术