Detect Before You Leap: Mirage Detection in Vision-Language Models 文章

ArXiv CS.CV2026-06-02NEWSen作者: Sayeed Shafayet Chowdhury, Md. Shaown Miah

摘要

arXiv:2606.00435v1 Announce Type: new Abstract: Vision-language models (VLMs) can produce confident visual answers even when the required visual evidence is missing, blank, or unrelated to the question. This failure mode, known as mirage (Asadi et al. 2026), is especially concerning in medical and document visual question answering, where plausible but visually ungrounded responses may be mistaken for image-based evidence. We study pre-release mirage detection: given an image-question pair, the goal is to determine whether a VLM should answer or abstain before producing a response. We propose Text-Conditioned Layer-wise Internal Alignment (TC-LIA), a model-agnostic method that probes patch-token representations across the layers of a CLIP ViT-H/14 vision encoder.

Detect Before You Leap: Mirage Detection in Vision-Language Models 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (2)

相关技术查看全部 (3)