FIGMA: Towards FIne-Grained Music retrievAl 文章

ArXiv CS.AI2026-06-08NEWSen作者: Nishit Anand, Ashish Seth, Sreyan Ghosh, Dinesh Manocha, Ramani Duraiswami

摘要

arXiv:2606.06615v1 Announce Type: cross Abstract: Retrieving music using natural language descriptions has improved with contrastive audio-text models such as CLAP, but current systems remain limited to coarse semantic queries. When descriptions specify fine-grained musical attributes such as tempo, key, chord progression, or rhythmic structure, existing models often fail to retrieve the correct audio. We show that this limitation stems from the contrastive learning objective itself: despite being trained on long captions, CLAP-based models effectively utilize only the first few tokens, discarding much of the information encoded in detailed prompts. Then, we propose FIGMA (FIne-Grained Music RetrievAl), a multi-view contrastive architecture that addresses this limitation by jointly optimizing global audio-text alignment and frame-level, token-wise alignment.

相关事件查看全部 (1)

FIGMA: Towards FIne-Grained Music retrievAl
2026-06-08PRODUCT_LAUNCH影响: MEDIUM

相关公司

暂无数据

相关人物

暂无数据

相关技术

暂无数据