SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration 文章

ArXiv CS.AI2026-06-10NEWSen作者: Kaustubh Mani, Yann Pequignot, Vincent Mai, Liam Paull

详细信息

来源站点: ArXiv CS.AI
作者: Kaustubh Mani, Yann Pequignot, Vincent Mai, Liam Paull
文章类型: NEWS
语言: en
发布日期: 2026-06-10

摘要

arXiv:2606.10228v1 Announce Type: cross Abstract: Safe exploration is a prerequisite for deploying reinforcement learning (RL) agents in safety-critical domains. In this paper, we approach safe exploration through the lens of epistemic uncertainty, where the actor's sensitivity to parameter perturbations serves as a practical proxy for regions of high uncertainty. We propose Sharpness-Aware Policy Optimization (SHAPO), a sharpness-aware policy update rule that evaluates gradients at perturbed parameters, making policy updates pessimistic with respect to the actor's epistemic uncertainty. Analytically we show that this adjustment implicitly reweighs policy gradients, amplifying the influence of rare unsafe actions while tempering contributions from already safe ones, thereby biasing learning toward conservative behavior in under-explored regions.

SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (1)