Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection 文章

ArXiv CS.CL2026-06-01NEWSen作者: Benedetta Muscato, Beiduo Chen, Gizem Gezici, Barbara Plank, Fosca Giannotti

摘要

arXiv:2605.31563v1 Announce Type: new Abstract: Human disagreement is ubiquitous and well-known in labeling. However, variation in explanations, captured through token-level human rationales, remains far less explored. At the same time, it is unclear how to best evaluate human labels and rationales -- or even how to best aggregate rationales beyond majority vote -- in light of this variation. Yet, rationales may provide additional insights into the richness of human reasoning, that may differ in style, values and interpretations -- especially in subjective NLP tasks like hate speech detection. In this work, we unify diverse models, training strategies, loss functions, and existing evaluation metrics under a single protocol by systematically re-implementing them across different label and rationale representation spaces.

Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术