Parallel Tempering Initial Sampling in Inference-Time Reward Alignment 文章

ArXiv CS.CV2026-06-01NEWSen作者: Myeongjun Oh, Gwangho Kim, Sungyoon Lee

摘要

arXiv:2605.30991v1 Announce Type: cross Abstract: Inference-time reward alignment steers pretrained diffusion and flow-based generative models to satisfy user-specified rewards without retraining. Recently, Sequential Monte Carlo (SMC) has emerged as a powerful framework for this task by iteratively filtering and propagating multiple particles. However, we show that standard SMC-based methods often suffer from poor performance because they initialize particles from a standard prior, whereas high-reward regions in complex reward landscapes are extremely rare. Further, we show that even recent reward-aware initial sampling approaches remain vulnerable to getting trapped in local modes, as complex reward landscapes are often multi-modal. To overcome these limitations, we propose PATHS (PArallel Tempering for High-complexity reward Sampling), a novel initialization method that couples multiple sampling chains through parallel tempering.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据