Measuring the Depth of LLM Unlearning via Activation Patching 事件

Name: Measuring the Depth of LLM Unlearning via Activation Patching
Start: 2026-05-26

REGULATION2026-05-26影响: MEDIUM

Measuring the Depth of LLM Unlearning via Activation Patching arXiv:2605.24614v1 Announce Type: new Abstract: Large language model (LLM) unlearning has emerged as a crucial post-hoc mechanism for privacy protection and AI safety, yet auditing whether target knowledge is truly erased remains challenging. Existing output-level metrics fail to detect when this knowledge remains recoverable from internal representations. Recent white-box studies reveal such residual knowledge but often rely on auxi

人工智能

关系图谱

Measuring the Depth of LLM Unlearning via Activation Patching 事件

相关公司查看全部 (10)

相关人物查看全部 (1)

相关产品查看全部 (10)

相关技术查看全部 (10)

相关报道查看全部 (1)