On Asymmetric Optimization of Reasoning and Perception in Vision-Language Model Post-Training 文章

ArXiv CS.CV2026-05-29NEWSen作者: Xueqing Wu, Yu-Chi Lin, Kai-Wei Chang, Nanyun Peng

详细信息

来源站点
ArXiv CS.CV
作者
Xueqing Wu, Yu-Chi Lin, Kai-Wei Chang, Nanyun Peng
文章类型
NEWS
语言
en
发布日期
2026-05-29

摘要

arXiv:2605.29496v1 Announce Type: cross Abstract: Post-training has greatly improved reasoning in frontier vision-language models, yet its gains for perception remain comparatively limited, creating a bottleneck for end-to-end visual reasoning. To investigate this gap, we introduce a controlled diagnostic framework with two synthetic tasks that disentangle perception from reasoning. Our analysis reveals a consistent perception-reasoning asymmetry: posttraining improves reasoning more substantially than perception, though the underlying mechanism differs by training paradigm. For supervised fine-tuning (SFT), this asymmetry stems from token imbalance in chain-of-thought supervision, where perception occupies fewer tokens and thus receives a weaker training signal. Dynamically reweighting the loss mitigates this imbalance and boosts end-to-end performance by up to 18.2.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据