S-SPPO: Semantic-Calibrated Self-Play Preference Optimization 文章
ArXiv CS.AI2026-06-02NEWSen作者: Xiwen Chen, Wenhui Zhu, Jingjing Wang, Peijie Qiu, Zhipeng Wang, Huayu Li, ZhengXiao He, Xuanzhao Dong, Prayag Tiwari, Mingkun Xu, Yujian Xiong, Feng Luo, Abolfazl Razi, Brendan Hogan Rappazzo, Anderson Schneider, Yuriy Nevmyvaka
S-SPPO: Semantic-Calibrated Self-Play Preference Optimization · 相关人物
暂无数据