Med-R2: An Adversarial Benchmark for Evidence-Grounded Reasoning in Medical VLMs 文章

ArXiv CS.CV2026-05-26NEWSen作者: Wen Ma, Fucheng Niu, Zhiting Fan, Zikai Xiao, Jiaxiang Liu, Zuozhu Liu

摘要

arXiv:2605.24492v1 Announce Type: new Abstract: Vision-language models have demonstrated impressive capabilities in general medical visual question answering, yet due to limited interpretability, it remains unclear whether their predictions reflect evidence-grounded clinical reasoning or reliance on spurious priors. We introduce Med-R2 Bench, a hierarchical benchmark aligned with the clinical workflow to evaluate adversarial robustness with visual grounding. We design stepwise QA tasks to assess whether reasoning chains are strictly grounded in visual evidence across the four clinical stages, and employ adversarial perturbations to test robustness against misleading cues. Med-R2 comprises 42,432 images, 31 task categories, and 110,406 QA pairs. Evaluation across 14 VLMs reveals a sequential performance degradation along the four-stage clinical workflow. Adversarial experiments show that models rely heavily on correct prompts to guess answers.

Med-R2: An Adversarial Benchmark for Evidence-Grounded Reasoning in Medical VLMs 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (2)

相关技术