Learning What to Forget: Improving LLM Unlearning via Learned Token-Level Importance 文章

ArXiv CS.CL2026-06-05NEWSen作者: Gizem Y\"uce, Giorgos Nikolaou, Nicolas Flammarion

摘要

arXiv:2606.06320v1 Announce Type: cross Abstract: Machine unlearning aims to remove targeted knowledge from a trained model while preserving its general capabilities. For autoregressive language models, not all tokens in a forget sample are equally relevant to forgetting. Existing approaches either ignore this heterogeneity or rely on auxiliary models, heuristics, or external annotations to estimate each token's relevance for forgetting. We instead characterize it through the interaction with the retain objective: a token is forget-specific to the extent that minimizing the forget loss on that token does not conflict with retain optimality. We formalize this perspective as a joint optimization problem over the model parameters and the token weights and show that, under a natural separation condition, the resulting objective recovers the oracle forget-specific token support.

Learning What to Forget: Improving LLM Unlearning via Learned Token-Level Importance 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (2)