$\pi$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data 文章

ArXiv CS.CL2026-05-26NEWSen作者: Yaocheng Zhang, Yuanheng Zhu, Wenyue Chong, Songjun Tu, Qichao Zhang, Jiajun Chai, Xiaohan Wang, Wei Lin, Guojun Yin, Dongbin Zhao

查看原文 →

关系图谱

详细信息

来源站点: ArXiv CS.CL
作者: Yaocheng Zhang, Yuanheng Zhu, Wenyue Chong, Songjun Tu, Qichao Zhang, Jiajun Chai, Xiaohan Wang, Wei Lin, Guojun Yin, Dongbin Zhao
文章类型: NEWS
语言: en
发布日期: 2026-05-26

原文

摘要

arXiv:2604.14054v2 Announce Type: replace-cross Abstract: Deep search agents have emerged as a promising paradigm for addressing complex information-seeking tasks, but their training remains challenging due to sparse rewards, weak credit assignment, and limited labeled data. Self-play offers a scalable route to reduce data dependence, but conventional self-play optimizes students only through sparse outcome rewards, leading to low learning efficiency. In this work, we observe that self-play naturally produces a question construction path (QCP) during task generation, an intermediate artifact that captures the reverse solution process. This reveals a new source of privileged information: self-play can provide high-quality privileged information for the self-distillation at low cost and at scale, without relying on human feedback or curated privileged information.

$\pi$-Play: Multi-Agent Self-Play via Privileged Self-Distillation without External Data 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (4)