Robust Reasoning via Dynamic Token Selection for Distribution-Aligned Self-Distillation 事件
PRODUCT_LAUNCH2026-06-02影响: MEDIUM
Robust Reasoning via Dynamic Token Selection for Distribution-Aligned Self-Distillation arXiv:2606.00628v1 Announce Type: new Abstract: Self-distillation improves learning efficiency by rewriting reference answers as training data that better matches the model's own distribution. However, reference answers also introduce strong stylistic biases, causing the generative model to imitate surface forms rather than learn useful reasoning patterns. We observe that the rewriting data contains a large