Mechanistic Interpretability as Statistical Estimation: A Variance Analysis 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

Mechanistic Interpretability as Statistical Estimation: A Variance Analysis arXiv:2510.00845v4 Announce Type: replace-cross Abstract: Mechanistic Interpretability (MI) aims to reverse-engineer model behaviors by identifying functional sub-networks. Yet, the scientific validity of these findings depends on their stability. In this work, we argue that circuit discovery is not a standalone task but a statistical estimation problem built upon causal mediation analysis (CMA). We uncover a fundamenta