Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems 事件
PRODUCT_LAUNCH2026-06-11影响: MEDIUM
Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems arXiv:2505.15201v5 Announce Type: replace-cross Abstract: Reinforcement Learning (RL) algorithms sample multiple n>1 solution attempts for each problem and reward them independently. This optimizes for pass@1 performance and prioritizes the strength of isolated samples at the expense of the diversity and collective utility of sets of samples. This under-utilizes the sampling capacity, limiting exploration and eventual im
Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems · 相关报道
相关报道
Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems
ArXiv CS.CL2026-06-11