Density-Aware Translation of Spurious Correlations in Zero-Shot VLMs 文章

ArXiv CS.CV2026-06-02NEWSen作者: Afsaneh Hasanebrahimi, Hanxun Huang, Christopher Leckie, Sarah Erfani

详细信息

来源站点: ArXiv CS.CV
作者: Afsaneh Hasanebrahimi, Hanxun Huang, Christopher Leckie, Sarah Erfani
文章类型: NEWS
语言: en
发布日期: 2026-06-02

摘要

arXiv:2606.01710v1 Announce Type: new Abstract: Vision-Language models (VLMs), such as CLIP, achieve powerful zero-shot classification. However, their predictions remain sensitive to spurious correlations, where contextual cues dominate over semantic content. Earlier solutions typically rely on fine-tuning or prompt engineering, which either undermine the advantages of pre-trained models or are prone to hallucination. In this work, we propose Density-Aware Translation (DAT) that refines image-text similarity scores using a local geometric density term derived from group reference sets. Our approach is motivated by the phenomenon that CLIP embeddings exhibit a modality gap and lie on an anisotropic shell in the feature space: common patterns cluster near the mean, while rare patterns are pushed outward. This geometry creates uneven alignment, where spurious correlations are amplified while semantically meaningful but rare cues are marginalised.

Density-Aware Translation of Spurious Correlations in Zero-Shot VLMs 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (3)

相关技术查看全部 (1)