LLM Judges Inconsistently Disagree Across Safety Criteria and Harm Categories 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

LLM Judges Inconsistently Disagree Across Safety Criteria and Harm Categories arXiv:2605.31381v1 Announce Type: new Abstract: We evaluate the consistency of automated judges in conducting a multi-dimensional safety evaluation in a reference-free setup. Our results indicate that Large Language Models are unreliable judges in identifying safety issues related to machine-generated advice in regulated domains such as finance, although they are more reliable at identifying more overt forms of unsafe

LLM Judges Inconsistently Disagree Across Safety Criteria and Harm Categories · 相关产品