Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems 事件
PRODUCT_LAUNCH2026-06-11影响: MEDIUM
Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems arXiv:2505.15201v5 Announce Type: replace-cross Abstract: Reinforcement Learning (RL) algorithms sample multiple n>1 solution attempts for each problem and reward them independently. This optimizes for pass@1 performance and prioritizes the strength of isolated samples at the expense of the diversity and collective utility of sets of samples. This under-utilizes the sampling capacity, limiting exploration and eventual im
相关产品查看全部 (10)
相关报道查看全部 (1)
Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems
ArXiv CS.CL2026-06-11