Boosting RL-Based Visual Reasoning with Selective Adversarial Entropy Intervention 文章

ArXiv CS.AI2026-06-02NEWSen作者: Yang Yu, Zhuangzhuang Chen, Lanqing Li, Xiaomeng Li

摘要

arXiv:2512.10414v2 Announce Type: replace Abstract: Recently, reinforcement learning (RL) has become a common choice in enhancing the reasoning capabilities of vision-language models (VLMs). Considering existing RL-based finetuning methods, entropy intervention turns out to be an effective way to benefit exploratory ability, thereby improving policy performance. Notably, most existing studies intervene in entropy by simply controlling the update of specific tokens during policy optimization of RL. They ignore the entropy intervention during the RL sampling that can boost the performance of GRPO by improving the diversity of responses. In this paper, we propose Selective-adversarial Entropy Intervention, namely SaEI, which enhances policy entropy by distorting the visual input with the token-selective adversarial objective coming from the entropy of sampled responses.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据