BackWeak: Backdooring Knowledge Distillation Simply with Weak Triggers and Fine-tuning 文章

ArXiv CS.CV2026-05-26NEWSen作者: Shanmin Wang, Dongdong Zhao

摘要

arXiv:2511.12046v2 Announce Type: replace-cross Abstract: Knowledge Distillation (KD) is essential for compressing large models, yet relying on pre-trained "teacher" models downloaded from third-party repositories introduces serious security risks--most notably backdoor attacks. Existing KD backdoor methods are typically complex and computationally intensive: they employ surrogate student models and simulated distillation to guarantee transferability, and construct triggers similar to universal adversarial perturbations (UAPs), which being not stealthy in magnitude, inherently exhibit strong adversarial behavior. This work questions whether such complexity is necessary and constructs stealthy "weak" triggers--imperceptible perturbations that have negligible adversarial effect. We propose BackWeak, a simple, surrogate-free attack paradigm.