Learning to Adapt SFT Data for Better Reasoning Generalization 文章

ArXiv CS.CL2026-05-27NEWSen作者: Lisong Sun, Li Wang, Chen Zhang, Jinyang Wu, Kui Zhang, Tianhao Peng, Wenjun Wu

摘要

arXiv:2605.26924v1 Announce Type: new Abstract: Large language models (LLMs) have achieved remarkable progress, with post-training playing a crucial role in enhancing their reasoning capabilities. Among post-training paradigms, supervised fine-tuning (SFT) is widely used: it leverages external data to provide dense supervision and enables efficient training. However, directly fine-tuning on expert data can hurt generalization when the data distribution is mismatched with the target model's own distribution. In this work, we propose Data Adaptation for Reasoning Tuning (DART), which formulates the use of a fixed, potentially distributionally misaligned SFT dataset as an optimization problem over demonstration transformations. DART trains a mapper model with reinforcement learning to convert original SFT data into model-adapted supervision that better matches the target model's distribution and learning preferences.

Learning to Adapt SFT Data for Better Reasoning Generalization 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (4)

相关人物

相关产品查看全部 (8)

相关技术查看全部 (28)