InstantForget: Update-Free Backdoor Unlearning with Inference-Time Feature Reset 文章

ArXiv CS.AI2026-06-16NEWSen作者: Zhenyu Yu

详细信息

来源站点: ArXiv CS.AI
作者: Zhenyu Yu
文章类型: NEWS
语言: en
发布日期: 2026-06-16

摘要

arXiv:2606.15730v1 Announce Type: cross Abstract: Backdoor unlearning aims to remove a malicious trigger behavior from a deployed model while preserving clean utility. We study the update-free inference-time setting, where model parameters remain frozen. First, we audit a common projection assumption under oracle paired clean and triggered features. Projection succeeds mainly on BadNets and leaves WaNet, Blended, and SIG at 0.683, 0.888, and 0.941 ASR on CIFAR-10 ResNet-18. This failure is not explained by spectral compactness, spatial locality, or subspace misalignment. It is predicted by a logit-triplet gap involving the target margin, target-logit drop, and non-target logit rise. We then introduce InstantForget, a clean-calibrated gated reset that flags anomalous features with a Mahalanobis score and moves only flagged features toward a neutral non-target representation.

InstantForget: Update-Free Backdoor Unlearning with Inference-Time Feature Reset 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (6)

相关技术查看全部 (2)