Cast a Wider Net: Coordinated Pass@K Policy Optimization for Code Reasoning 事件
PRODUCT_LAUNCH2026-05-27影响: MEDIUM
Cast a Wider Net: Coordinated Pass@K Policy Optimization for Code Reasoning arXiv:2605.27000v1 Announce Type: new Abstract: Repeated sampling with a verifier is the standard way to allocate test-time compute for code generation, with pass@$K$ as the canonical metric. Yet the standard policy class draws $K$ independent samples from a single answer distribution, so attempts often collapse onto near-duplicate reasoning paths and waste the budget on redundant rollouts. This failure is costly in com
相关报道查看全部 (1)
Cast a Wider Net: Coordinated Pass@K Policy Optimization for Code Reasoning
ArXiv CS.CL2026-06-02