Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable Rewards 文章

ArXiv CS.CL2026-06-02NEWSen作者: Yiran Shen, Yu Xia, Jonathan Chang, Prithviraj Ammanabrolu

摘要

arXiv:2510.01167v2 Announce Type: replace-cross Abstract: Aligning large language models to human preferences is inherently multidimensional, yet most pipelines collapse heterogeneous signals into a single objective. We seek to answer what it would take to simultaneously align a model across various domains spanning those with: verifiable rewards, non-verifiable subjective preferences, and complex interactive scenarios. Such multi-objective alignment setups are often plagued by individual objectives being at odds with each other, resulting in inefficient training and limited user control during inference. To address these issues, we propose $\textbf{M}$ulti-$\textbf{A}$ction-$\textbf{H}$ead $\textbf{AL}$ignment with PRM-guided Dec$\textbf{O}$ding ($\textbf{MAHALO}$), a unified framework that standardizes PRM training across verifiable and non-verifiable settings for step-level supervision, performs vectorized multi-objective alignment with Multi-Action-Head DPO, and enables…

摘要可能不完整,可查看原文

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据