CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test 事件

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test arXiv:2605.23491v2 Announce Type: replace-cross Abstract: Recently, Reinforcement Learning with Verifiable Rewards (RLVR) and Test-Time Scaling (TTS) have advanced LLM code generation through executable verification. Yet Ground-Truth Unit Tests (GT UTs) remain a bottleneck: SOTA RLVR methods require them for costly training, while existing TTS methods lose competitiveness without them. This motivates GT-free TTS,