PMC-InterCPT: Rethinking Biomedical Interleaved Data for Multimodal Continued Pretraining 事件
PRODUCT_LAUNCH2026-06-02影响: MEDIUM
PMC-InterCPT: Rethinking Biomedical Interleaved Data for Multimodal Continued Pretraining arXiv:2606.01049v1 Announce Type: new Abstract: Large-scale biomedical image-text datasets extracted from scientific literature provide valuable resources for medical multimodal model training. These datasets are commonly organized as image-caption pairs; however, figure captions are often short, context-dependent, and only partially informative without the surrounding article text. At the same time, large