HorusEye: Language as Dynamic Attention for Emergency Visual Analysis 文章

ArXiv CS.CV2026-06-16NEWSen作者: Armel Yara

详细信息

来源站点
ArXiv CS.CV
作者
Armel Yara
文章类型
NEWS
语言
en
发布日期
2026-06-16

摘要

arXiv:2606.14741v1 Announce Type: new Abstract: We introduce HorusEye, Language as Dynamic Attention for Emergency Visual Analysis. Our investigation followed five stages. The first one is benchmarking RefCOCO-Degraded, a dataset of 15,244 images (3,811 base images x 4 conditions: Clean, Fog, Smoke and Thermal) with systematic visual degradation. Through four research questions, we evaluate multiple VLMs (Gemini, Qwen2-VL, BLIP-2, LLaVA, Kosmos-2) across visual grounding the second stage, language feedback recovery the third one, health VQA tasks the fourth, and hallucination analysis the final stage. Our key finding is that language feedback effectiveness is model-dependent: Gemini achieves +47.3% improvement in thermal conditions through iterative language feedback, while Qwen2-VL shows -5.1% degradation under the same protocol. We also identify the 'Thermal Paradox' where cropping strategies that improve RGB performance catastrophically fail in thermal imagery.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据