DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning 文章

ArXiv CS.CL2026-05-26NEWSen作者: Guochao Jiang, Jingyi Song, Guofeng Quan, Chuzhan Hao, Guohua Liu, Yuewei Zhang

DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning · 相关技术