Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO arXiv:2605.30789v1 Announce Type: cross Abstract: We identify a new dimension for enhancing rollout diversity in Group Relative Policy Optimization (GRPO) for LLMs. While GRPO relies on diverse rollouts, prevailing strategies primarily increase diversity by injecting more token-level randomness, which may introduce step-wise noise and lead to incoherent trajectories. We uncover that smaller models within the same model fami

Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO · 相关产品