S-SPPO: Semantic-Calibrated Self-Play Preference Optimization 事件

Name: S-SPPO: Semantic-Calibrated Self-Play Preference Optimization
Start: 2026-06-02

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

S-SPPO: Semantic-Calibrated Self-Play Preference Optimization arXiv:2606.01561v1 Announce Type: new Abstract: Aligning Large Language Models (LLMs) with human preferences is often formulated via Direct Preference Optimization (DPO). However, the standard Bradley-Terry instantiation of DPO is limited in modeling common departures from transitivity in human preferences. To address this, recent work has introduced Self-Play Preference Optimization (SPPO), which iteratively refines the policy by tr

人工智能

关系图谱

S-SPPO: Semantic-Calibrated Self-Play Preference Optimization 事件

S-SPPO: Semantic-Calibrated Self-Play Preference Optimization · 相关报道

相关报道