Global Policy-Space Response Oracles for Two-Player Zero-Sum Games 文章

ArXiv CS.AI2026-05-28NEWSen作者: Junyu Zhang, Feihong Yang, Jian Wang, Chao Wang, Xudong Zhang

摘要

arXiv:2605.28273v1 Announce Type: new Abstract: The Policy-Space Response Oracles (PSRO) framework scales equilibrium computation to large zero-sum games by iteratively expanding a restricted strategy set using deep reinforcement learning (DRL). A central challenge is to construct, under limited computational budgets, a small strategy population whose induced game well approximates the full game. Existing PSRO variants typically expand the population using best responses to meta-strategies computed from restricted-game payoffs, which can lead to inefficient expansions that provide limited global improvement. We propose to guide population expansion by directly evaluating the post-expansion population quality. Specifically, we adopt Population Exploitability (PE) to measure how well a restricted strategy set represents the full game, and introduce a two-phase exploration--selection framework that explicitly minimizes PE during expansion.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据