AdvantageFlow: Advantage-Weighted Least Squares for RL in Flow Models 事件
PRODUCT_LAUNCH2026-05-26影响: MEDIUM
AdvantageFlow: Advantage-Weighted Least Squares for RL in Flow Models arXiv:2605.26013v1 Announce Type: cross Abstract: We introduce AdvantageFlow, a forward-process reinforcement learning algorithm for rectified flow models. Unlike Flow-GRPO, which optimizes the reverse process, we optimize an advantage-weighted forward-process prediction loss. This optimization problem is unstable when advantages are negative and the loss becomes non-convex. We stabilize it by rollout policy regularization, w
相关人物
暂无数据
相关产品查看全部 (10)
相关报道查看全部 (1)
AdvantageFlow: Advantage-Weighted Least Squares for RL in Flow Models
ArXiv CS.CV2026-05-26