Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems 事件

PRODUCT_LAUNCH2026-06-11影响: MEDIUM

Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems arXiv:2505.15201v5 Announce Type: replace-cross Abstract: Reinforcement Learning (RL) algorithms sample multiple n>1 solution attempts for each problem and reward them independently. This optimizes for pass@1 performance and prioritizes the strength of isolated samples at the expense of the diversity and collective utility of sets of samples. This under-utilizes the sampling capacity, limiting exploration and eventual im

Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems · 相关公司

I
ISONONPROFIT
A
AT TCOMPANY
M
MITUNIVERSITY
A
arXivNONPROFIT
I
IRECNONPROFIT
E
EARNNONPROFIT
A
ACTNONPROFIT
R
RatioRESEARCH_INSTITUTE