Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection 事件
PRODUCT_LAUNCH2026-06-01影响: MEDIUM
Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection arXiv:2605.31563v1 Announce Type: new Abstract: Human disagreement is ubiquitous and well-known in labeling. However, variation in explanations, captured through token-level human rationales, remains far less explored. At the same time, it is unclear how to best evaluate human labels and rationales -- or even how to best aggregate rationales beyond majority vote -- in light of this variation