Beyond Transfer Accuracy: Faithful Circuits for Controlled Low-Resource Adaptation 文章

ArXiv CS.CL2026-05-27NEWSen作者: Khumaisa Nur'aini, Ayu Purwarianti, Alham Fikri Aji, Derry Wijaya

摘要

arXiv:2601.08146v3 Announce Type: replace Abstract: Existing circuit discovery methods rely on templated tasks with clean counterfactuals, limiting their use on diverse natural text. We adapt Contextual Decomposition for Transformers (CD-T) for unstructured settings via label-balanced activation means and task-directional relevance scoring, enabling counterfactual-free circuit discovery. We leverage these circuits for Circuit-Targeted Supervised Fine-Tuning (CT-SFT), restricting parameter updates to task-relevant heads and LayerNorm. Experiments on NusaX cross-lingual sentiment transfer show that CT-SFT is highly competitive for low-resource adaptation. While non-circuit sparse updates and full fine-tuning sometimes match target accuracy through capacity recruitment, CT-SFT uniquely minimizes catastrophic forgetting, preserving source-language and related-task performance.