Cornerstones or Stumbling Blocks? Deciphering the Rock Tokens in On-Policy Distillation 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

Cornerstones or Stumbling Blocks? Deciphering the Rock Tokens in On-Policy Distillation arXiv:2605.09253v2 Announce Type: replace Abstract: While recent work in Reinforcement Learning with Verifiable Rewards (RLVR) has shown that a small subset of critical tokens disproportionately drives reasoning gains, an analogous token-level understanding of On-Policy Distillation (OPD) remains largely unexplored. In this work, we investigate high-loss tokens, a token type that--as the most direct signal o

Cornerstones or Stumbling Blocks? Deciphering the Rock Tokens in On-Policy Distillation · 相关人物