IsoCLIP: Decomposing CLIP Projectors for Efficient Intra-modal Alignment 事件
PRODUCT_LAUNCH2026-06-01影响: MEDIUM
IsoCLIP: Decomposing CLIP Projectors for Efficient Intra-modal Alignment arXiv:2603.19862v2 Announce Type: replace Abstract: Vision-Language Models like CLIP are extensively used for inter-modal tasks which involve both visual and text modalities. However, when the individual modality encoders are applied to inherently intra-modal tasks like image-to-image retrieval, their performance suffers from the intra-modal misalignment. In this paper we study intra-modal misalignment in CLIP with a focus
IsoCLIP: Decomposing CLIP Projectors for Efficient Intra-modal Alignment · 相关报道
相关报道
IsoCLIP: Decomposing CLIP Projectors for Efficient Intra-modal Alignment
ArXiv CS.CV2026-06-01