Look-Closer-Then-Diagnose: Confidence-Aware Ultrasound VQA via Active Zooming 文章

ArXiv CS.CV2026-05-26NEWSen作者: Yue Zhou, Erxuan Wu, Yikang Sun, Hongjoo Lee, Yuan Bi, Huixiong Xu, Nassir Navab, Zhongliang Jiang

摘要

arXiv:2605.21652v2 Announce Type: replace Abstract: Vision-Language Models (VLMs) have significantly advanced medical visual question answering, yet their performance in ultrasound remains suboptimal. In clinical practice, sonographers explicitly focus on lesion regions to formulate reports, though diagnostic interpretations sometimes vary due to inherent subjectivity. However, existing VLMs are not explicitly structured to interactively zoom into lesions prior to diagnosis; moreover, they typically treat annotations as unbiased ground truths, failing to account for their inherent subjectivity and ambiguity. In this paper, we propose a framework specifically designed to consider the sonographer's cognitive workflow. We first introduce a structured Zoom-then-Diagnose paradigm, which replicates the interactive search process to enable lesion-focused reasoning.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据