Does Seeing More Mean Knowing More? Mono-Anchored Advantage Normalization for Multi-Source Visual Reasoning 文章

ArXiv CS.CV2026-05-26NEWSen作者: Fanhu Zeng, Zhicong Luo, Zefan Wang, You Li, Chi Chen, Maosong Sun

摘要

arXiv:2605.25437v1 Announce Type: new Abstract: Visual reasoning through reinforcement learning with verifiable rewards (RLVR) has achieved remarkable progress. However, when dealing with multi-source inputs, existing approaches tend to treat them as a mere accumulation of information, lacking explicit mechanisms to distinguish whether integrating additional sources yields information gain or introduces interference. Therefore, they struggle to effectively model dynamic interaction when integrating multiple sources, particularly when they differ significantly in physical properties and semantics, e.g., infrared and depth, leading to inferior performance to mono-source reasoning when a certain source holds the dominant signal. To address this issue, we propose MARS, a novel mono-anchored multi-source reasoning framework that models each visual modality as an independent information source.

相关公司

暂无数据

相关人物

暂无数据