Context-driven Missing-Modality Learning for Robust Medical Diagnosis with Image-Tabular Data 文章

ArXiv CS.CV2026-05-26NEWSen作者: Tianling Liu, Lequan Yu, Tong Han, Liang Wan

摘要

arXiv:2605.25968v1 Announce Type: new Abstract: While multimodal data integrating diverse imaging and clinical tabular records is crucial for accurate medical diagnosis, the arbitrary absence of specific modalities is prevalent in clinical practice, severely degrading the performance of multimodal models. Existing methods either discard missing modalities, leading to information loss, or struggle to synthesize them without capturing complex inter-modal dependencies. To address these limitations, we propose a novel Context-driven Missing-Modality Learning (CMML) framework, which sequentially performs modality synthesis and semantic alignment to achieve robust diagnosis under arbitrary missing conditions. Specifically, we design a Cascade Residual Transformer-based Autoencoder (CRTA) that leverages learnable context tokens acting as dataset-level semantic prior to capture inter-modal dependencies and synthesize key missing representations.