InstantForget: Update-Free Backdoor Unlearning with Inference-Time Feature Reset 文章

ArXiv CS.AI2026-06-16NEWSen作者: Zhenyu Yu

详细信息

来源站点
ArXiv CS.AI
作者
Zhenyu Yu
文章类型
NEWS
语言
en
发布日期
2026-06-16

摘要

arXiv:2606.15730v1 Announce Type: cross Abstract: Backdoor unlearning aims to remove a malicious trigger behavior from a deployed model while preserving clean utility. We study the update-free inference-time setting, where model parameters remain frozen. First, we audit a common projection assumption under oracle paired clean and triggered features. Projection succeeds mainly on BadNets and leaves WaNet, Blended, and SIG at 0.683, 0.888, and 0.941 ASR on CIFAR-10 ResNet-18. This failure is not explained by spectral compactness, spatial locality, or subspace misalignment. It is predicted by a logit-triplet gap involving the target margin, target-logit drop, and non-target logit rise. We then introduce InstantForget, a clean-calibrated gated reset that flags anomalous features with a Mahalanobis score and moves only flagged features toward a neutral non-target representation.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据