Cultural Binding Heads in Language Models 文章

ArXiv CS.CL2026-05-28NEWSen作者: Avrile Floro, Luca Benedetto

摘要

arXiv:2605.28543v1 Announce Type: cross Abstract: LLMs often default to equal treatment across cultural groups, even though context warrants differentiation: this is a lack of difference awareness. Using mechanistic interpretability and a factorial design on the N4 cultural appropriation benchmark from Wang et al. (2025), we identify 2-3 mid-layer attention heads per model that contribute causally to cultural binding across eight models (four architectures, base and instruct). Cultural binding is the process of associating cultural items with the appropriate identity. Knockout of the identity-to-item edges on these heads lowers the binding strength by 9-23%. The identified heads transfer from instruct to base models, suggesting that cultural binding is created at pre-training.

相关事件查看全部 (1)

Cultural Binding Heads in Language Models
2026-05-28PRODUCT_LAUNCH影响: MEDIUM

相关公司

暂无数据

相关人物

暂无数据