Backdoor Unlearning Generalization: A Path Toward the Removal of Unknown Triggers in LLMs 事件
PRODUCT_LAUNCH2026-06-03影响: MEDIUM
Backdoor Unlearning Generalization: A Path Toward the Removal of Unknown Triggers in LLMs arXiv:2606.03785v1 Announce Type: new Abstract: Backdoor attacks in Large Language Models (LLMs) are a growing security concern, where models can generate adversary-chosen content. Existing defenses target backdoors one at a time and typically require knowledge of the trigger, leaving the defender at a structural disadvantage when unknown backdoors may exist in a model. We show that backdoor neutralization