Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems 事件

Name: Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems
Start: 2026-06-11

PRODUCT_LAUNCH2026-06-11影响: MEDIUM

Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems arXiv:2505.15201v5 Announce Type: replace-cross Abstract: Reinforcement Learning (RL) algorithms sample multiple n>1 solution attempts for each problem and reward them independently. This optimizes for pass@1 performance and prioritizes the strength of isolated samples at the expense of the diversity and collective utility of sets of samples. This under-utilizes the sampling capacity, limiting exploration and eventual im

人工智能

关系图谱

Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems · 相关公司

ISONONPROFIT

AT TCOMPANY

MITUNIVERSITY

ADI

Abstract

arXivNONPROFIT

IRECNONPROFIT

EARNNONPROFIT

ACTNONPROFIT

RatioRESEARCH_INSTITUTE