Don't Forget Your Embeddings: Robust Knowledge Erasure via Precise Editing of Embeddings 文章

ArXiv CS.CL2026-06-03NEWSen作者: Clara Haya Suslik, Or Shafran, Mor Geva

摘要

arXiv:2606.03695v1 Announce Type: new Abstract: As language models are increasingly deployed in real-world applications, the ability to erase specific knowledge from them becomes critical for safety and compliance. Prominent methods seek persistent removal by updating the model's parameters, yet the target knowledge often can be recovered through adversarial prompting or relearning. In this work, we hypothesize this limitation stems in part from existing methods overlooking the embedding layer. To address this, we introduce EMBedding ERasure (EMBER), a plug-n-play erasure module that leverages Sparse Matrix Factorization for precise erasure of concept-related features from token embeddings. Through comprehensive evaluations across diverse concepts on Gemma-2-2B-it and Llama-3.1-8B-Instruct, we find that augmenting existing methods with EMBER consistently improves erasure efficacy and specificity across task formats, with minimal coherence loss.

Don't Forget Your Embeddings: Robust Knowledge Erasure via Precise Editing of Embeddings 文章

摘要

相关事件查看全部 (2)

相关公司

相关人物

相关产品查看全部 (13)

相关技术查看全部 (2)