Multimodal Concept Bottleneck Models 文章

ArXiv CS.CV2026-06-19NEWSen作者: Tongqing Shi, Ge Yan, Tuomas Oikarinen, Tsui-Wei Weng

详细信息

来源站点: ArXiv CS.CV
作者: Tongqing Shi, Ge Yan, Tuomas Oikarinen, Tsui-Wei Weng
文章类型: NEWS
语言: en
发布日期: 2026-06-19

摘要

arXiv:2606.19882v1 Announce Type: new Abstract: Concept Bottleneck Models (CBMs) enhance the interpretability of deep learning networks by aligning the features extracted from images with natural concepts. However, existing CBMs are constrained in their ability to generalize beyond a fixed set of predefined classes and the risk of non-concept information leakage, where predictive signals outside the intended concepts are inadvertently exploited. In this paper, we propose Multimodal Concept Bottleneck Model (MM-CBM) to address these issues and extend CBMs into CLIP. MM-CBM utilizes dual Concept Bottleneck Layers (CBLs) to align both the image and text embeddings into interpretable features. This allows us to perform new vision tasks like zero-shot classification or image retrieval in an interpretable way. Compared to existing methods, MM-CBM achieves up to 51.26% accuracy improvement on average across four standard benchmarks.

Multimodal Concept Bottleneck Models 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (2)

相关技术查看全部 (2)