SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks 事件

Name: SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks
Start: 2026-06-01

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks arXiv:2605.31433v1 Announce Type: new Abstract: Self-play can train language models without external supervision. However, existing methods require rule-checkable answers, leaving open-ended tasks dependent on curated prompts or frontier-model judges. We introduce SCOPE, a data-free self-play framework for open-ended tasks that co-evolves two policies: a Challenger that generates document-grounded tasks, and a Solver that answers th

人工智能

关系图谱

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks 事件

SCOPE: Self-Play via Co-Evolving Policies for Open-Ended Tasks · 相关技术

相关技术