Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success 事件

PRODUCT_LAUNCH2026-06-04影响: MEDIUM

Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success arXiv:2601.18175v2 Announce Type: replace Abstract: A widely used technique for improving policies is success conditioning, in which one collects trajectories, identifies those that achieve a desired outcome, and updates the policy to imitate the actions taken along successful trajectories. This principle appears under many names -- rejection sampling with SFT, goal-conditioned RL, Decision Transfor

Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success · 相关人物