Semi-Offline Reinforcement Learning for Optimized Text Generation 文章

ArXiv CS.CL2026-06-05NEWSen作者: Changyu Chen, Xiting Wang, Yiqiao Jin, Victor Ye Dong, Li Dong, Jie Cao, Yi Liu, Rui Yan

详细信息

来源站点
ArXiv CS.CL
作者
Changyu Chen, Xiting Wang, Yiqiao Jin, Victor Ye Dong, Li Dong, Jie Cao, Yi Liu, Rui Yan
文章类型
NEWS
语言
en
发布日期
2026-06-05

摘要

arXiv:2306.09712v2 Announce Type: replace-cross Abstract: In reinforcement learning (RL), there are two major settings for interacting with the environment: online and offline. Online methods explore the environment at significant time cost, and offline methods efficiently obtain reward signals by sacrificing exploration capability. We propose semi-offline RL, a novel paradigm that smoothly transits from offline to online settings, balances exploration capability and training cost, and provides a theoretical foundation for comparing different RL settings. Based on the semi-offline formulation, we present the RL setting that is optimal in terms of optimization cost, asymptotic error, and overfitting error bound. Extensive experiments show that our semi-offline approach is efficient and yields comparable or often better performance compared with state-of-the-art methods.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据