DUEL: Adversarial Self-Play for Multimodal Reasoning 文章

ArXiv CS.CV2026-05-26NEWSen作者: Lin Qiu, Hanqing Zeng, Yao Liu, Bingjun Sun, Guangdeng Liao, Ji Liu

摘要

arXiv:2605.24794v1 Announce Type: new Abstract: Reinforcement learning (RL) has emerged as an effective paradigm for improving the reasoning capability of vision-language models (VLMs). However, RL-based optimization typically depends on costly high-quality annotations that are difficult to scale. Existing unsupervised alternatives may drift toward biased solutions due to weak visual grounding and the lack of reliable verification signals. We propose a self-evolving post-training framework, DUEL, where supervision emerges from adversarial interactions between two policies initialized from the same pretrained VLM. A Challenger generates an image-grounded true claim together with a minimally perturbed hard-negative counterpart, while a Solver verifies both claims against the image, encouraging fine-grained visual discrimination under near-neighbor semantics.

相关事件查看全部 (1)

DUEL: Adversarial Self-Play for Multimodal Reasoning
2026-05-26PRODUCT_LAUNCH影响: MEDIUM

相关公司

暂无数据

相关人物

暂无数据