PAC-Bayesian Reinforcement Learning Trains Generalizable Policies 文章

ArXiv CS.AI2026-06-01NEWSen作者: Abdelkrim Zitouni, Mehdi Hennequin, Juba Agoun, Ryan Horache, Nadia Kabachi, Omar Rivasplata

查看原文 →

关系图谱

详细信息

来源站点: ArXiv CS.AI
作者: Abdelkrim Zitouni, Mehdi Hennequin, Juba Agoun, Ryan Horache, Nadia Kabachi, Omar Rivasplata
文章类型: NEWS
语言: en
发布日期: 2026-06-01

原文

摘要

arXiv:2510.10544v3 Announce Type: replace-cross Abstract: We derive a novel PAC-Bayesian generalization bound for reinforcement learning that explicitly accounts for Markov dependencies in the data, through the chain's mixing time. This contributes to overcoming challenges in obtaining generalization guarantees for reinforcement learning, where the sequential nature of data breaks the independence assumptions underlying classical bounds. The new bound provides non-vacuous certificates for modern off-policy algorithms such as Soft Actor-Critic. We demonstrate the practical utility of the bound through PB-SAC, a novel algorithm that optimizes the bound during training to guide exploration. Experiments across several continuous control tasks show that the proposed approach provides meaningful confidence certificates while maintaining competitive performance.

PAC-Bayesian Reinforcement Learning Trains Generalizable Policies 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (2)

相关技术查看全部 (4)