PAFO: Pareto Fairness Optimization for Personalized Reward Modeling 事件
PRODUCT_LAUNCH2026-06-09影响: MEDIUM
PAFO: Pareto Fairness Optimization for Personalized Reward Modeling arXiv:2606.07988v1 Announce Type: new Abstract: Large language models (LLMs) increasingly rely on reward models to align their outputs with diverse user preferences. While personalized reward models aim to capture such heterogeneity, they are often trained on imbalanced user preference data and may therefore favor users whose preferences are more common in the training population. In this paper, we identify this failure mode as
相关产品查看全部 (10)
相关报道查看全部 (1)
PAFO: Pareto Fairness Optimization for Personalized Reward Modeling
ArXiv CS.AI2026-06-09