COMET: Concept Space Dissection of the Modality Gap in Audio-Text Multimodal Contrastive Embeddings 事件

PRODUCT_LAUNCH2026-05-29影响: MEDIUM

COMET: Concept Space Dissection of the Modality Gap in Audio-Text Multimodal Contrastive Embeddings arXiv:2605.29628v1 Announce Type: cross Abstract: Contrastive Language-Audio Pretraining (CLAP) models are widely used for audio understanding and support modality-agnostic condition swapping in many zero-shot applications. However, their performance is heavily affected by the modality gap between audio and text embeddings. Existing explanations mainly attribute this gap to the cone effect, treat