Visual Persuasion: What Influences Decisions of Vision-Language Models? 文章

ArXiv CS.CV2026-06-02NEWSen作者: Manuel Cherep, Pranav M R, Pattie Maes, Nikhil Singh

摘要

arXiv:2602.15278v2 Announce Type: replace Abstract: The web is littered with images, once created for human consumption and now increasingly interpreted by agents using vision-language models (VLMs). These agents make visual decisions at scale, deciding what to click, recommend, or buy. Yet, we know little about the structure of their visual preferences. We introduce a framework for studying this by placing VLMs in controlled image-based choice tasks and systematically perturbing their inputs. Our key idea is to treat the agent's decision function as a latent visual utility that can be inferred through revealed preference: choices between systematically edited images. Starting from common images, such as product photos, we propose methods for visual prompt optimization, adapting text optimization methods to iteratively propose and apply visually plausible modifications using an image generation model (such as in composition, lighting, or background).

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据