Robust Reasoning via Dynamic Token Selection for Distribution-Aligned Self-Distillation 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

Robust Reasoning via Dynamic Token Selection for Distribution-Aligned Self-Distillation arXiv:2606.00628v1 Announce Type: new Abstract: Self-distillation improves learning efficiency by rewriting reference answers as training data that better matches the model's own distribution. However, reference answers also introduce strong stylistic biases, causing the generative model to imitate surface forms rather than learn useful reasoning patterns. We observe that the rewriting data contains a large