Are we really tilting? The mechanics of reward guidance in flow and diffusion models 文章

ArXiv CS.AI2026-06-03NEWSen作者: Sanjit Dandapanthula, Nicholas M. Boffi

摘要

arXiv:2606.02884v1 Announce Type: cross Abstract: Reward guidance algorithms steer a learned generative process toward the reward-tilted measure at inference time. While empirically powerful, these methods are prone to reward hacking: the guided model over-optimizes the reward at the cost of fidelity to the learned distribution. Prior work has attributed this to the complexity of neural reward functions or implicit biases in diffusion training, but its fundamental origins remain poorly understood. We show that reward hacking arises from an approximation made in most practical implementations of reward-guided diffusion -- finite-particle plug-in estimation of the Doob h-function -- even in the simplest non-trivial settings of Gaussian and Gaussian mixture targets with quadratic rewards. In closed form, we isolate two distinct failure modes of the plug-in estimator: it leads to reward hacking within each mode and it cannot select high-reward modes.

Are we really tilting? The mechanics of reward guidance in flow and diffusion models 文章

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (9)