Neuron-Level Interventions for Gendered and Gender-Neutral Generation in Language Models 文章

ArXiv CS.CL2026-06-01NEWSen作者: Zhiwen You, Nafiseh Nikeghbal, Jana Diesner

摘要

arXiv:2605.30717v1 Announce Type: new Abstract: Language models (LMs) can produce gendered language and stereotypes even when given neutral prompts. Most prior work on gender bias in LMs primarily examines gender through a binary lens (feminine vs. masculine), with limited attention to gender-neutral forms, such as they/them pronouns or neutrally phrased job titles. How gender-related signals are encoded in the internal representations of LMs remains an open question. In this work, we study gender-specific neurons in LMs across three categories: feminine, masculine, and gender-neutral. We propose a neuron-level intervention method to identify neurons that are strongly tied to each gender category. We then test these neurons through controlled generation, showing that activating or masking gender-related neurons can steer a sentence toward a target gender form while preserving its original meaning.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据