Comparing Human Gaze and Vision-Language Model Attention in Safety-Relevant Environments 文章

ArXiv CS.CV2026-06-16NEWSen作者: Marta Vallejo, Siwen Wang

详细信息

来源站点: ArXiv CS.CV
作者: Marta Vallejo, Siwen Wang
文章类型: NEWS
语言: en
发布日期: 2026-06-16

摘要

arXiv:2606.15202v1 Announce Type: new Abstract: Human visual attention plays an important role in how people perceive and respond to environments containing potential risks. This study investigates whether large vision-language models can identify the same regions of a scene that attract human attention in safety-relevant environments. Eye-tracking data were collected from ten participants viewing 33 scene images representing environments with varying levels of potential risk using Pupil Invisible wearable glasses. Gaze coordinates were mapped onto stimulus images to generate population-averaged human gaze heatmaps. In parallel, GPT-4o was prompted through the OpenAI Vision Application Programming Interface (API) to generate spatial predictions of visual attention, which were converted into saliency maps for comparison with human gaze patterns.

Comparing Human Gaze and Vision-Language Model Attention in Safety-Relevant Environments 文章

详细信息

摘要

相关事件

相关公司查看全部 (1)

相关人物

相关产品查看全部 (8)

相关技术查看全部 (4)