Uni-DPO: A Unified Paradigm for Dynamic Preference Optimization of LLMs 文章

ArXiv CS.CV2026-05-26NEWSen作者: Shangpin Peng, Weinong Wang, Zhuotao Tian, Senqiao Yang, Xing Wu, Haotian Xu, Chengquan Zhang, Takashi Isobe, Baotian Hu, Min Zhang

Uni-DPO: A Unified Paradigm for Dynamic Preference Optimization of LLMs · 相关技术