Language Bias in LVLMs: From In-Depth Analysis to Simple and Effective Mitigation 文章

ArXiv CS.CL2026-05-26NEWSen作者: Yangneng Chen, Jing Li

摘要

arXiv:2605.25036v1 Announce Type: new Abstract: Large Vision-Language Models (LVLMs) extend large language models with visual understanding, but remain vulnerable to hallucination, where outputs are fluent yet inconsistent with images. Recent studies link this issue to language bias-the tendency of LVLMs to over-rely on text while neglecting visual inputs. Yet most analyses remain empirical without uncovering its underlying cause. In this paper, we provide a systematic study of language bias and identify its root in modality misalignment during training. Our analysis shows that both Visual Instruction Tuning (VIT) and Direct Preference Optimization (DPO) often prioritize textual improvements, which may cause LVLMs to overly lean toward language modeling rather than balanced multimodal understanding.

Language Bias in LVLMs: From In-Depth Analysis to Simple and Effective Mitigation 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (4)

相关人物

相关产品查看全部 (9)

相关技术查看全部 (25)