Neuron-Level Interventions for Gendered and Gender-Neutral Generation in Language Models 文章

ArXiv CS.CL2026-06-01NEWSen作者: Zhiwen You, Nafiseh Nikeghbal, Jana Diesner

摘要

arXiv:2605.30717v1 Announce Type: new Abstract: Language models (LMs) can produce gendered language and stereotypes even when given neutral prompts. Most prior work on gender bias in LMs primarily examines gender through a binary lens (feminine vs. masculine), with limited attention to gender-neutral forms, such as they/them pronouns or neutrally phrased job titles. How gender-related signals are encoded in the internal representations of LMs remains an open question. In this work, we study gender-specific neurons in LMs across three categories: feminine, masculine, and gender-neutral. We propose a neuron-level intervention method to identify neurons that are strongly tied to each gender category. We then test these neurons through controlled generation, showing that activating or masking gender-related neurons can steer a sentence toward a target gender form while preserving its original meaning.

Neuron-Level Interventions for Gendered and Gender-Neutral Generation in Language Models 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (1)