OPRD: On-Policy Representation Distillation 事件

Name: OPRD: On-Policy Representation Distillation
Start: 2026-06-06

PRODUCT_LAUNCH2026-06-06影响: MEDIUM

OPRD: On-Policy Representation Distillation arXiv:2606.06021v1 Announce Type: cross Abstract: On-policy distillation (OPD) supervises the student only in output space by matching next-token probabilities. This output-only paradigm has two limits: (1) sampling variance from Monte Carlo KL estimates over large vocabularies (e.g., Qwen's ~150k tokens) persists throughout training, and (2) it treats the teacher as a black-box, discarding all intermediate hidden states after the LM head. We propose

人工智能

关系图谱

OPRD: On-Policy Representation Distillation · 相关人物

Aime

Sam