Configurable Reward Model for Balanced Safety Alignment 文章

ArXiv CS.CL2026-06-01NEWSen作者: Zhengping Jiang, Mehran Khodabandeh, Akash Bharadwaj, Manik Bhandari, Mayur Srungarapu, Anqi Liu, Benjamin Van Durme, Li Chen

Configurable Reward Model for Balanced Safety Alignment · 相关人物

暂无数据