Learning Self-Correction in Vision-Language Models via Rollout Augmentation 事件

PRODUCT_LAUNCH2026-06-05影响: MEDIUM

Learning Self-Correction in Vision-Language Models via Rollout Augmentation arXiv:2602.08503v2 Announce Type: replace Abstract: Self-correction is essential for solving complex reasoning problems in vision-language models (VLMs). However, existing reinforcement learning (RL) methods struggle to learn it, as effective self-correction behaviors emerge only rarely, making learning signals extremely sparse. To address this challenge, we propose correction-specific rollouts (Octopus), an RL rollout