Reinforcement Learning with Robust Rubric Rewards 文章

ArXiv CS.CV2026-05-29NEWSen作者: Ya-Qi Yu, Hao Wang, Fangyu Hong, Xiangyang Qu, Gaojie Wu, Qiaoyu Luo, Nuo Xu, Huixin Wang, Wuheng Xu, Yongxin Liao, Zihao Chen, Haonan Li, Ziming Li, Dezhi Peng, Minghui Liao, Jihao Wu, Haoyu Ren, Dandan Tu

Reinforcement Learning with Robust Rubric Rewards · 相关技术