Learning to Adapt SFT Data for Better Reasoning Generalization 事件
PRODUCT_LAUNCH2026-05-27影响: MEDIUM
Learning to Adapt SFT Data for Better Reasoning Generalization arXiv:2605.26924v1 Announce Type: new Abstract: Large language models (LLMs) have achieved remarkable progress, with post-training playing a crucial role in enhancing their reasoning capabilities. Among post-training paradigms, supervised fine-tuning (SFT) is widely used: it leverages external data to provide dense supervision and enables efficient training. However, directly fine-tuning on expert data can hurt generalization when t