Certified Circuits: Stability Guarantees for Mechanistic Circuits 文章

ArXiv CS.CV2026-06-01NEWSen作者: Alaa Anani, Tobias Lorenz, Bernt Schiele, Mario Fritz, Jonas Fischer

摘要

arXiv:2602.22968v3 Announce Type: replace-cross Abstract: Understanding how neural networks arrive at their predictions is essential for debugging, auditing, and deployment. Mechanistic interpretability pursues this goal by identifying circuits--minimal subnetworks responsible for specific behaviors. However, existing circuit discovery methods are brittle: circuits depend strongly on the chosen concept dataset and often fail to transfer out-of-distribution, raising doubts whether they capture the concept or merely dataset-specific artifacts. We introduce Certified Circuits, which provide provable stability guarantees for circuit discovery. Our framework wraps any black-box discovery algorithm with randomized data subsampling to certify that inclusion decisions over circuit components--neurons or edges of the model graph, depending on the base algorithm--are invariant to bounded edit-distance perturbations of the concept dataset.

Certified Circuits: Stability Guarantees for Mechanistic Circuits 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (2)