Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution 文章

ArXiv CS.CL2026-06-16NEWSen作者: Jacob Dineen, Aswin RRV, Zhikun Xu, Ben Zhou

详细信息

来源站点
ArXiv CS.CL
作者
Jacob Dineen, Aswin RRV, Zhikun Xu, Ben Zhou
文章类型
NEWS
语言
en
发布日期
2026-06-16

摘要

arXiv:2604.03472v3 Announce Type: replace Abstract: Co-evolutionary self-play, where one language model generates problems and another solves them, promises autonomous curriculum learning without human supervision. In practice, the proposer quickly converges to a narrow distribution of problems that satisfy the reward function. This diversity collapse renders the curriculum uninformative for the solver, stalling the co-evolutionary loop. We introduce vocabulary dropout, a random mask applied to the proposer's output logits during both policy training and curriculum generation, as a lightweight mechanism to sustain diversity. The mask is hard and non-stationary, preventing the proposer from locking into fixed token sequences. Training Qwen3-4B and Qwen3-8B on mathematical reasoning via R-Zero, we find that vocabulary dropout sustains proposer diversity across lexical, semantic, and functional metrics throughout training. It also yields solver improvements averaging +4.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据