Erased but Not Forgotten: How Backdoors Compromise Concept Erasure 文章

ArXiv CS.AI2026-06-02NEWSen作者: Tobias Braun, Jonas Henry Grebe, Marcus Rohrbach, Anna Rohrbach

详细信息

来源站点: ArXiv CS.AI
作者: Tobias Braun, Jonas Henry Grebe, Marcus Rohrbach, Anna Rohrbach
文章类型: NEWS
语言: en
发布日期: 2026-06-02

摘要

arXiv:2504.21072v2 Announce Type: replace-cross Abstract: The expansion of text-to-image diffusion models has raised concerns about harmful outputs, from fabricated depictions of public figures to sexually explicit imagery. To mitigate such risks, prior work has proposed concept erasure methods that aim to sever unwanted concepts from the model via fine-tuning, yet it remains unclear whether these approaches truly remove all links to the harmful concept or merely conceal superficial connections. In this work, we reveal a critical vulnerability, the Erasure Evasion Backdoor (EEB): an adversary binds a backdoor trigger to a concept slated for removal, and this malicious link survives subsequent erasure. We show that both black-box and white-box adversaries can instantiate this threat.

Erased but Not Forgotten: How Backdoors Compromise Concept Erasure 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (3)