Leveraging Visual Signals for Robust Token-Level Uncertainty in Vision-Language Generation 文章

ArXiv CS.CV2026-05-27NEWSen作者: Joseph Hoche, David Brellmann, Gianni Franchi

摘要

arXiv:2605.27136v1 Announce Type: new Abstract: Uncertainty quantification (UQ) remains a critical challenge in Large Vision Language Models (LVLMs) for reliable predictions and real-world deployment. However, most existing methods are adapted from the LLM literature and primarily focus on the language modality, leaving the contribution of visual information to LVLM uncertainty largely underexplored. In this paper, we investigate how LVLMs process visual information and whether this process can be used to improve uncertainty estimation. By analyzing hidden representations after the integration of visual features during the generation process, we observe that high-confidence predictions rely more heavily on visual content than uncertain ones.

Leveraging Visual Signals for Robust Token-Level Uncertainty in Vision-Language Generation 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (4)

相关人物

相关产品查看全部 (9)

相关技术查看全部 (19)