DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization 文章

ArXiv CS.CL2026-06-01NEWSen作者: Jian Mu, Tianyi Lin, Chengwei Qin, Zhongxiang Dai, Yao Shu

摘要

arXiv:2605.31455v1 Announce Type: cross Abstract: Large language models are increasingly deployed in multi-turn interactive settings where users or environments can iteratively provide lightweight feedback. Unfortunately, optimizing such behavior presents a sharp dilemma in practice: online reinforcement learning is able to effectively address multi-turn dynamics but is prohibitively expensive due to the cost of generating full correction trajectories at every update, whereas offline supervised fine-tuning (SFT) is efficient but suffers from distribution shift and behavioral collapse. To this end, we novelly propose DRIFT (Decoupled Rollouts and Importance-Weighted Fine-Tuning), a framework that operationalizes the theoretical insight that the KL-regularized RL objective is equivalent to importance-weighted supervised learning.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据