Model Unlearning Objectives Vary for Distinct Language Functions 事件
PRODUCT_LAUNCH2026-05-27影响: MEDIUM
Model Unlearning Objectives Vary for Distinct Language Functions arXiv:2605.26454v1 Announce Type: new Abstract: Large language models (LLMs) learn undesirable properties during pretraining, including dangerous knowledge and toxic text generation. Just as post-training uses different objectives to shape different behaviors, we argue that unlearning methods should be designed for the language function at issue. To study this, we consider two mechanistically distinct unlearning goals, dangerous-k
相关产品查看全部 (10)
相关报道查看全部 (1)
Model Unlearning Objectives Vary for Distinct Language Functions
ArXiv CS.CL2026-05-27