When Interpretability Becomes a Liability: Adversarial Attacks on CBM Concept Layers 事件

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

When Interpretability Becomes a Liability: Adversarial Attacks on CBM Concept Layers arXiv:2605.25304v1 Announce Type: cross Abstract: Concept Bottleneck Models (CBMs) have emerged as a cornerstone approach for interpretable machine learning, providing human-understandable intermediate representations through explicit concept activations. However, this interpretability fundamentally introduces a critical, previously unexplored attack surface: the concept bottleneck layer itself. We present a co

When Interpretability Becomes a Liability: Adversarial Attacks on CBM Concept Layers · 相关报道