SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks 事件
PRODUCT_LAUNCH2026-06-01影响: MEDIUM
SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks arXiv:2605.31433v1 Announce Type: new Abstract: Self-play can train language models without external supervision. However, existing methods require rule-checkable answers, leaving open-ended tasks dependent on curated prompts or frontier-model judges. We introduce SCOPE, a data-free self-play framework for open-ended tasks that co-evolves two policies: a Challenger that generates document-grounded tasks, and a Solver that answers th