Inverse Reinforcement Learning without an Optimal Demonstrator: A Feasible Reward Set Approach 文章

ArXiv CS.AI2026-06-01NEWSen作者: Kihyun Kim, Shripad Deshmukh, Nikos Vlassis, Jiawei Zhang

详细信息

来源站点: ArXiv CS.AI
作者: Kihyun Kim, Shripad Deshmukh, Nikos Vlassis, Jiawei Zhang
文章类型: NEWS
语言: en
发布日期: 2026-06-01

摘要

arXiv:2605.30903v1 Announce Type: cross Abstract: Inverse reinforcement learning (IRL) typically assumes demonstrations from a single optimal demonstrator, but in many applications data come from multiple imperfect demonstrators with heterogeneous suboptimality levels. We study reward learning in this setting through a feasible-reward-set framework: for each demonstrator, we encode its declared suboptimality level as a linear constraint and intersect the resulting feasible sets across demonstrators. Our theoretical analysis shows that the joint feasible set shrinks monotonically as data are added, and we give an exact characterization of when a new demonstrator strictly tightens it. We further establish two recovery guarantees for the feasible reward set of the ground-truth optimal demonstrator: one bound depends on closeness to the optimal occupancy, while the other requires only sufficient coverage and no near-optimal demonstrator.

Inverse Reinforcement Learning without an Optimal Demonstrator: A Feasible Reward Set Approach 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (1)