Attribute-Based Diagnosis of LLM Alignment with Hate Speech Annotations 事件
PRODUCT_LAUNCH2026-05-27影响: MEDIUM
Attribute-Based Diagnosis of LLM Alignment with Hate Speech Annotations arXiv:2605.27025v1 Announce Type: new Abstract: Hate speech annotation is costly, subjective, and prone to annotator disagreement, making large-scale dataset construction challenging. We systematically analyze how well large language models (LLMs) align with human judgments across ten theoretically grounded subjective attributes, such as dehumanization, violence, and sentiment, evaluating both small and large variants of Ll
相关人物
暂无数据
相关产品查看全部 (10)
相关报道查看全部 (1)
Attribute-Based Diagnosis of LLM Alignment with Hate Speech Annotations
ArXiv CS.CL2026-05-27