Would you still call this Dax? Novel Visual References in VLMs and Humans 文章

ArXiv CS.CV2026-06-05NEWSen作者: Ada Defne T\"ur, Gaurav Kamath, Joyce Chai, Siva Reddy, Benno Krojer

摘要

arXiv:2606.05409v1 Announce Type: new Abstract: Vision-language models (VLMs), like human learners, are frequently exposed to new visual concepts, but how they map novel visual references to language after exposure remains largely underexplored, particularly when those references contradict prior knowledge from pre-training. To study this, we present the Novel Visual References Dataset (NVRD): 19,176 images spanning 90 visual concepts across different levels of visual novelty, each with up to 20 increasingly perturbed versions of the original object to probe generalization. Unlike prior work on visual augmentations of familiar concepts, NVRD comprises entirely novel, open-ended stimuli constructed from scratch, mirroring how humans encounter genuinely new concepts.

相关公司

暂无数据

相关人物

暂无数据