Cross-Modal Attention Calibration for LVLM Hallucination Mitigation 文章

ArXiv CS.CV2026-06-01NEWSen作者: Jiaming Li, Jiacheng Zhang, Zequn Jie, Lin Ma, Guanbin Li

摘要

arXiv:2501.01926v3 Announce Type: replace Abstract: Large vision-language models (LVLMs) have shown remarkable capabilities in visual-language understanding. Despite their success, LVLMs still suffer from generating hallucinations in complex generation tasks, leading to inconsistencies between visual inputs and generated content. To address this issue, some approaches have introduced inference-time interventions, such as contrastive decoding, to reduce overreliance on language priors. However, these approaches overlook hallucinations stemming from position bias and spurious inter-modality correlations. In this paper, we propose a Cross-Modal Attention Calibration (CMAC) method to mitigate hallucinations in LVLMs in a training-free manner. In this method, we design an Inter-Modality Decoding (IMD) module to alleviate hallucination by a novel contrastive decoding mechanism.

Cross-Modal Attention Calibration for LVLM Hallucination Mitigation 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (4)