Cross-Modal Attention Calibration for LVLM Hallucination Mitigation 文章

ArXiv CS.CV2026-06-01NEWSen作者: Jiaming Li, Jiacheng Zhang, Zequn Jie, Lin Ma, Guanbin Li

摘要

arXiv:2501.01926v3 Announce Type: replace Abstract: Large vision-language models (LVLMs) have shown remarkable capabilities in visual-language understanding. Despite their success, LVLMs still suffer from generating hallucinations in complex generation tasks, leading to inconsistencies between visual inputs and generated content. To address this issue, some approaches have introduced inference-time interventions, such as contrastive decoding, to reduce overreliance on language priors. However, these approaches overlook hallucinations stemming from position bias and spurious inter-modality correlations. In this paper, we propose a Cross-Modal Attention Calibration (CMAC) method to mitigate hallucinations in LVLMs in a training-free manner. In this method, we design an Inter-Modality Decoding (IMD) module to alleviate hallucination by a novel contrastive decoding mechanism.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据