Multilingual Unlearning in LLMs: Transfer, Dynamics, and Reversibility 文章

ArXiv CS.CL2026-06-03NEWSen作者: Chaoyi Xiang, Olga Ohrimenko, Benjamin I. P. Rubinstein, Lea Frermann

摘要

arXiv:2606.03291v1 Announce Type: new Abstract: Large language models (LLMs) can memorize sensitive facts, motivating unlearning methods that remove targeted knowledge without costly retraining. However, unlearning research remains heavily English-centric. We study multilingual unlearning by extending the TOFU benchmark to five languages, and fine-tune, unlearn, and query our models with different permutations of languages. We find that unlearning transfer, the ability of an unlearned model to "forget" facts in languages other than the unlearning language, is highly variable: e.g., it is strongest between languages sharing scripts and families, and we show that the unlearning language predicts which query languages are most likely to yield the strongest transfer. Layer-wise analysis reveals that unlearning leaves the shared cross-lingual latent space largely intact in early layers, instead operating primarily in later decoding layers.

Multilingual Unlearning in LLMs: Transfer, Dynamics, and Reversibility 文章

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (3)

相关技术查看全部 (1)