Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization 文章

ArXiv CS.AI2026-06-02NEWSen作者: Hyung Gyu Rho

Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization · 相关人物

暂无数据