MARS: Margin and Semantic-Aware Data Augmentation for Reward Modeling 文章

ArXiv CS.AI2026-05-26NEWSen作者: Payel Bhattacharjee, Osvaldo Simeone, Ravi Tandon

摘要

arXiv:2602.17658v2 Announce Type: replace-cross Abstract: Reward modeling is central to alignment pipelines such as RLHF, RLAIF, and PPO-based policy optimization, yet its reliability is constrained by limited and heterogeneous human preference data that are expensive to collect at scale. While synthetic augmentation can expand preference supervision, existing methods often augment uniformly or at the representation level, without targeting examples where the reward model is uncertain or prone to mis-ranking. In this paper, we introduce MARS (Margin and Semantic-Aware Data Augmentation for Reward Modeling), an adaptive augmentation framework that prioritizes low-margin preference pairs and uses semantic distance as a second layer for refinement to enhance the contrast between the chosen and rejected responses.

MARS: Margin and Semantic-Aware Data Augmentation for Reward Modeling 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (3)

相关人物

相关产品查看全部 (8)

相关技术查看全部 (22)