Regularized Offline Policy Optimization with Posterior Hybrid Bayesian Belief 文章

ArXiv CS.AI2026-06-02NEWSen作者: Hongqiang Lin, Pengfei Wang, Nenggan Zheng

摘要

arXiv:2606.00680v1 Announce Type: new Abstract: Offline reinforcement learning (RL) aims to optimize policies from pre-collected datasets. A bottleneck of this paradigm is managing epistemic uncertainty, which arises from limited data coverage (sample-level) and the ambiguity in identifying transition dynamics from finite data (model-level). To provide a unified quantification of these uncertainties, Bayesian RL has been proposed by treating the dynamics model as a random variable and maintaining a corresponding belief. Despite its theoretical appeal, policy optimization in Bayesian RL remains computationally challenging as it requires solving composite objectives with expectations. Prior methods either employ search-based techniques with poor computational scalability or impose restrictive posterior assumptions that sacrifice the adaptability of Bayesian RL.

Regularized Offline Policy Optimization with Posterior Hybrid Bayesian Belief 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术