SAEmnesia: Erasing Concepts in Diffusion Models with Supervised Sparse Autoencoders 事件

BREAKTHROUGH2026-06-01影响: HIGH

SAEmnesia: Erasing Concepts in Diffusion Models with Supervised Sparse Autoencoders arXiv:2509.21379v3 Announce Type: replace Abstract: Concept unlearning in diffusion models is hampered by feature splitting, where concepts are distributed across many latent features, making their removal challenging and computationally expensive. We introduce SAEmnesia, a supervised sparse autoencoder framework that overcomes this by enforcing one-to-one concept-neuron mappings. By systematically labeling conc

SAEmnesia: Erasing Concepts in Diffusion Models with Supervised Sparse Autoencoders · 相关技术