An Agency-Transferring Model-Free Policy Enhancement Technique 文章

ArXiv CS.AI2026-06-09NEWSen作者: Anton Bolychev, Georgiy Malaniya, Sinan Ibrahim, Pavel Osinenko

摘要

arXiv:2606.09825v1 Announce Type: cross Abstract: Training reinforcement learning (RL) policies from scratch is costly: it requires careful reward and environment design, extensive tuning, and substantial computation. Yet many control problems already have a functional but suboptimal policy available as a baseline. This paper proposes a method for embedding such a baseline into the RL training process, simultaneously improving training efficiency relative to from-scratch methods and producing a learning policy that outperforms the baseline. At each step, the method arbitrates between the baseline policy and a trainable learning policy, initially relying strongly on the baseline policy and then progressively transferring agency to the learning policy. By the end of training, the learning policy is a standalone neural network that operates without baseline policy support. The paper formalizes what it means for the baseline policy to be…

摘要可能不完整,可查看原文

相关事件查看全部 (1)

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据