Prompt-Level Reward Specifications for Open-Ended Post-Training 文章

ArXiv CS.CL2026-05-29NEWSen作者: Zijun Weng, Xiaohui Hu, Shuangyong Song, Yongxiang Li, Kaidong Yu, Xuanjing Huang

Prompt-Level Reward Specifications for Open-Ended Post-Training · 相关技术

暂无数据