SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration 文章

ArXiv CS.AI2026-06-10NEWSen作者: Kaustubh Mani, Yann Pequignot, Vincent Mai, Liam Paull

详细信息

来源站点
ArXiv CS.AI
作者
Kaustubh Mani, Yann Pequignot, Vincent Mai, Liam Paull
文章类型
NEWS
语言
en
发布日期
2026-06-10

摘要

arXiv:2606.10228v1 Announce Type: cross Abstract: Safe exploration is a prerequisite for deploying reinforcement learning (RL) agents in safety-critical domains. In this paper, we approach safe exploration through the lens of epistemic uncertainty, where the actor's sensitivity to parameter perturbations serves as a practical proxy for regions of high uncertainty. We propose Sharpness-Aware Policy Optimization (SHAPO), a sharpness-aware policy update rule that evaluates gradients at perturbed parameters, making policy updates pessimistic with respect to the actor's epistemic uncertainty. Analytically we show that this adjustment implicitly reweighs policy gradients, amplifying the influence of rare unsafe actions while tempering contributions from already safe ones, thereby biasing learning toward conservative behavior in under-explored regions.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据