MedFM-Robust: Benchmarking Robustness of Medical Foundation Models 文章

ArXiv CS.CV2026-05-26NEWSen作者: Xiangxiang Cui, Tianjin Huang, Yifang Wang, Lijie Hu, Lu Yin

摘要

arXiv:2605.19027v3 Announce Type: replace Abstract: Medical foundation models have achieved remarkable clinical performance, yet their robustness under real-world perturbations remains underexplored. We present a robustness benchmark comprising 40 perturbation types (12 base, 28 medical-specific) across eight imaging modalities, evaluating five VLMs (LLaVA-Med, MedGemma, MedGemma-1.5, Gemini-2.5-flash and GPT-4o-mini) on VQA, visual grounding, and captioning, alongside two segmentation models (MedSAM, SAM-Med2D) with five fine-tuning strategies. Our findings reveal: (1) Fine-tuning strategy dominates robustness, with LoRA exhibiting nearly double the degradation of full fine-tuning, while SAM-Med2D's Adapter offers favorable efficiency-robustness trade-off. (2) Medical-specific perturbations disproportionately damage segmentation, with 9 of 15 top corruptions being domain-specific.