Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection 文章

ArXiv CS.CL2026-06-01NEWSen作者: Benedetta Muscato, Beiduo Chen, Gizem Gezici, Barbara Plank, Fosca Giannotti

摘要

arXiv:2605.31563v1 Announce Type: new Abstract: Human disagreement is ubiquitous and well-known in labeling. However, variation in explanations, captured through token-level human rationales, remains far less explored. At the same time, it is unclear how to best evaluate human labels and rationales -- or even how to best aggregate rationales beyond majority vote -- in light of this variation. Yet, rationales may provide additional insights into the richness of human reasoning, that may differ in style, values and interpretations -- especially in subjective NLP tasks like hate speech detection. In this work, we unify diverse models, training strategies, loss functions, and existing evaluation metrics under a single protocol by systematically re-implementing them across different label and rationale representation spaces.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据

相关技术

暂无数据