详细信息
- 来源站点
- ArXiv CS.CV
- 作者
- I Putu Adi Pratama, Bahadorreza Ofoghi, Atul Sajjanhar, Shang Gao
- 文章类型
- NEWS
- 语言
- en
- 发布日期
- 2026-06-05
摘要
arXiv:2606.05535v1 Announce Type: new Abstract: Medical visual question answering (Med-VQA) has strong potential for clinical decision support by enabling AI models to interpret medical images and answer clinically relevant queries. Recent approaches typically connect off-the-shelf vision encoders with large language models (LLMs) through lightweight mapping networks to reduce computational cost. However, these methods often overlook the importance of handling noise and small irrelevant changes in visual representations. To address these challenges, we propose a noise-aware Med-VQA framework that incorporates a denoising autoencoder before visual embeddings are mapped into the input space of an LLM. The denoising autoencoder is pretrained to reconstruct clean visual embeddings from corrupted inputs, encouraging the model to learn robust visual representations that are less sensitive to noise.
相关事件
暂无数据
相关公司
暂无数据
相关人物
暂无数据
相关产品
暂无数据