Addressing Exacerbated Attention Sink for Source-Free Cross-Domain Few-Shot Learning 文章

ArXiv CS.CV2026-05-26NEWSen作者: Shuai Yi, Yixiong Zou, Yuhua Li, Ruixuan Li

摘要

arXiv:2605.25799v1 Announce Type: new Abstract: Vision-language models (VLMs) like CLIP have shown impressive generalization capabilities, yet their potential for Cross-Domain Few-Shot Learning (CDFSL) remains underexplored, where the model needs to transfer source-domain information to target domains with scarce training data. While the attention sink phenomenon has been observed in VLMs for certain tasks, its role in CDFSL scenarios has not been studied. In this paper, we uncover a critical issue overlooked by prior works: standard target-domain few-shot fine-tuning in CDFSL significantly exacerbates the attention sink problem, leading to poor discriminability across classes. To understand this phenomenon, through extensive experiments, we interpret it as the model's shortcut learning for domain adaptation: to overcome the huge domain gap between the source and target domains, the model shows a high tendency to push tokens that are initially closer to target-domain classes (i.e.