PictSure: Pretraining Embeddings Matters for In-Context Learning Image Classifiers 文章

ArXiv CS.CV2026-06-01NEWSen作者: Lukas Schiesser, Cornelius Wolff, Sophie Haas, Simon Pukrop

摘要

arXiv:2506.14842v2 Announce Type: replace Abstract: Building image classification models remains cumbersome in data-scarce domains, where collecting large labeled datasets is impractical. In-context learning (ICL) is a promising paradigm for few-shot image classification (FSIC), but prior work has underexplored the relative importance of encoder pretraining versus fusion-layer training data. We present PictSure, a vision-only ICL family of models that demonstrates the potential of easy-to-use fusion transformer architectures, as well as the need for better embedding representations across a wider range of image domains. In both in-domain and out-of-domain evaluations, we find that representation quality induced by pretraining strongly correlates with downstream ICL performance.