Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution 文章

ArXiv CS.CL2026-06-16NEWSen作者: Jacob Dineen, Aswin RRV, Zhikun Xu, Ben Zhou

详细信息

来源站点: ArXiv CS.CL
作者: Jacob Dineen, Aswin RRV, Zhikun Xu, Ben Zhou
文章类型: NEWS
语言: en
发布日期: 2026-06-16

摘要

arXiv:2604.03472v3 Announce Type: replace Abstract: Co-evolutionary self-play, where one language model generates problems and another solves them, promises autonomous curriculum learning without human supervision. In practice, the proposer quickly converges to a narrow distribution of problems that satisfy the reward function. This diversity collapse renders the curriculum uninformative for the solver, stalling the co-evolutionary loop. We introduce vocabulary dropout, a random mask applied to the proposer's output logits during both policy training and curriculum generation, as a lightweight mechanism to sustain diversity. The mask is hard and non-stationary, preventing the proposer from locking into fixed token sequences. Training Qwen3-4B and Qwen3-8B on mathematical reasoning via R-Zero, we find that vocabulary dropout sustains proposer diversity across lexical, semantic, and functional metrics throughout training. It also yields solver improvements averaging +4.

Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (6)

相关技术查看全部 (3)