LLM Judges Inconsistently Disagree Across Safety Criteria and Harm Categories 事件

Name: LLM Judges Inconsistently Disagree Across Safety Criteria and Harm Categories
Start: 2026-06-01

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

LLM Judges Inconsistently Disagree Across Safety Criteria and Harm Categories arXiv:2605.31381v1 Announce Type: new Abstract: We evaluate the consistency of automated judges in conducting a multi-dimensional safety evaluation in a reference-free setup. Our results indicate that Large Language Models are unreliable judges in identifying safety issues related to machine-generated advice in regulated domains such as finance, although they are more reliable at identifying more overt forms of unsafe

人工智能

关系图谱