Where Does Toxicity Live? Mechanistic Localization and Targeted Suppression in Language Models 事件

Name: Where Does Toxicity Live? Mechanistic Localization and Targeted Suppression in Language Models
Start: 2026-05-28

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

Where Does Toxicity Live? Mechanistic Localization and Targeted Suppression in Language Models arXiv:2605.27997v1 Announce Type: new Abstract: Large language models frequently generate toxic, hateful, or harmful content, yet existing mitigation methods rely on costly retraining or output-level filtering with no mechanistic insight into where toxicity originates internally. We introduce Meow2X and TRNE, two complementary retraining-free frameworks that localize toxicity to specific layers and ne

人工智能

关系图谱

Where Does Toxicity Live? Mechanistic Localization and Targeted Suppression in Language Models 事件

相关公司查看全部 (10)

相关人物查看全部 (1)

相关产品查看全部 (10)

相关技术查看全部 (10)

相关报道查看全部 (1)