LLM Judges Inconsistently Disagree Across Safety Criteria and Harm Categories 事件
PRODUCT_LAUNCH2026-06-01影响: MEDIUM
LLM Judges Inconsistently Disagree Across Safety Criteria and Harm Categories arXiv:2605.31381v1 Announce Type: new Abstract: We evaluate the consistency of automated judges in conducting a multi-dimensional safety evaluation in a reference-free setup. Our results indicate that Large Language Models are unreliable judges in identifying safety issues related to machine-generated advice in regulated domains such as finance, although they are more reliable at identifying more overt forms of unsafe
相关产品查看全部 (10)
相关报道查看全部 (1)
LLM Judges Inconsistently Disagree Across Safety Criteria and Harm Categories
ArXiv CS.CL2026-06-01