Reward Learning through Ranking Mean Squared Error 文章

ArXiv CS.AI2026-06-06NEWSen作者: Chaitanya Kharyal, Calarina Muslimani, Matthew E. Taylor

详细信息

来源站点: ArXiv CS.AI
作者: Chaitanya Kharyal, Calarina Muslimani, Matthew E. Taylor
文章类型: NEWS
语言: en
发布日期: 2026-06-06

摘要

arXiv:2601.09236v3 Announce Type: replace-cross Abstract: Reward design remains a significant bottleneck in applying reinforcement learning (RL) to real-world problems. A popular alternative is reward learning, where reward functions are inferred from human feedback rather than manually specified. Recent work has proposed learning reward functions from human ratings rather than traditional binary preferences, enabling richer and potentially less cognitively demanding supervision. Building on this paradigm, we introduce a new rating-based RL method, Ranked Return Regression for RL (R4). At its core, R4 uses a novel ranking mean squared error loss that learns from a dataset of trajectory-rating pairs, treating the human-provided discrete ratings (e.g., bad, neutral, good) as ordinal targets. Unlike prior rating-based approaches, R4 offers formal guarantees: its solution set is provably minimal and complete under mild assumptions.

Reward Learning through Ranking Mean Squared Error 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术