On the Hidden Costs of Counterfactual Knowledge Training in LLM Unlearning 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

On the Hidden Costs of Counterfactual Knowledge Training in LLM Unlearning arXiv:2605.27083v1 Announce Type: new Abstract: Counterfactual tuning (CFT) has emerged as a promising paradigm for Large Language Model (LLM) unlearning by training models to generate alternative fictitious knowledge in place of undesired content. However, in this work, we find that this paradigm still underperforms other paradigms in some aspects, and identify two previously overlooked pitfalls underlying this gap: (1)

On the Hidden Costs of Counterfactual Knowledge Training in LLM Unlearning · 相关报道