Representation-Aware Unlearning via Activation Signatures: From Suppression to Entity-Signature Erasure 文章

ArXiv CS.CL2026-05-27NEWSen作者: Syed Naveed Mahmood, Md. Rezaur Rahman Bhuiyan, Tasfia Zaman, Jareen Tasneem Khondaker, Md. Sameer Sakib, K. M. Shadman Wadith, Nazia Tasnim, Farig Sadeque

查看原文 →

关系图谱

详细信息

来源站点: ArXiv CS.CL
作者: Syed Naveed Mahmood, Md. Rezaur Rahman Bhuiyan, Tasfia Zaman, Jareen Tasneem Khondaker, Md. Sameer Sakib, K. M. Shadman Wadith, Nazia Tasnim, Farig Sadeque
文章类型: NEWS
语言: en
发布日期: 2026-05-27

原文

摘要

arXiv:2601.10566v5 Announce Type: replace Abstract: Entity-level unlearning is usually evaluated by what a model says: whether it stops naming the target, refuses a query, or shifts a Truth Ratio distribution. These output-level tests, however, do not show whether a subject's internal representation has been attenuated. We introduce the Entity Representation Unlearning Framework (ERUF), a representation-aware framework that mines subject-specific activation signatures, suppresses the corresponding activation direction, and distills the behavior into LoRA parameters. Among evaluated baselines, ERUF is the only method that jointly achieves surface-level suppression, internal attenuation, and utility preservation. On TOFU forget10, ERUF achieves FQ = 0.99 and MU = 0.62, matching reported oracle utility while approaching oracle forget quality. Across most standard foundation-model settings, ERUF maintains low leakage and low internal target activation, with SMR between 0.00% and 1.

Representation-Aware Unlearning via Activation Signatures: From Suppression to Entity-Signature Erasure 文章

详细信息

摘要

相关事件

相关公司查看全部 (3)

相关人物

相关产品查看全部 (10)

相关技术查看全部 (16)