Noise-Aware Visual Representation Learning for Medical Visual Question Answering 文章

ArXiv CS.CV2026-06-05NEWSen作者: I Putu Adi Pratama, Bahadorreza Ofoghi, Atul Sajjanhar, Shang Gao

详细信息

来源站点: ArXiv CS.CV
作者: I Putu Adi Pratama, Bahadorreza Ofoghi, Atul Sajjanhar, Shang Gao
文章类型: NEWS
语言: en
发布日期: 2026-06-05

摘要

arXiv:2606.05535v1 Announce Type: new Abstract: Medical visual question answering (Med-VQA) has strong potential for clinical decision support by enabling AI models to interpret medical images and answer clinically relevant queries. Recent approaches typically connect off-the-shelf vision encoders with large language models (LLMs) through lightweight mapping networks to reduce computational cost. However, these methods often overlook the importance of handling noise and small irrelevant changes in visual representations. To address these challenges, we propose a noise-aware Med-VQA framework that incorporates a denoising autoencoder before visual embeddings are mapped into the input space of an LLM. The denoising autoencoder is pretrained to reconstruct clean visual embeddings from corrupted inputs, encouraging the model to learn robust visual representations that are less sensitive to noise.

Noise-Aware Visual Representation Learning for Medical Visual Question Answering 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (3)