Distilling LLM Feedback for Lean Theorem Proving 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

Distilling LLM Feedback for Lean Theorem Proving arXiv:2605.30861v1 Announce Type: new Abstract: Post-training for reasoning models typically combines supervised fine-tuning with reinforcement learning from verifiable rewards, most commonly with GRPO. However, this algorithm suffers from sparse rewards, limited exploration, and mode collapse. Building upon recent works on self-distillation, we propose Feedback Distillation, a training method where the model is trained to match, at the token lev

Distilling LLM Feedback for Lean Theorem Proving · 相关人物