STABLEVAL: Disagreement-Aware and Stable Evaluation of AI Systems 事件

Name: STABLEVAL: Disagreement-Aware and Stable Evaluation of AI Systems
Start: 2026-06-02

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

STABLEVAL: Disagreement-Aware and Stable Evaluation of AI Systems arXiv:2605.02122v2 Announce Type: replace-cross Abstract: Human evaluation remains the primary standard for assessing modern AI systems, yet annotator disagreement, bias, and variability make system rankings fragile under standard majority vote aggregation. Majority vote discards annotator reliability and item-level ambiguity, often yielding unstable comparisons across annotator subsets. We introduce STABLEVAL, a disagreement-awa

人工智能

关系图谱

STABLEVAL: Disagreement-Aware and Stable Evaluation of AI Systems 事件

相关公司查看全部 (10)

相关人物查看全部 (1)

相关产品查看全部 (10)

相关技术查看全部 (10)

相关报道查看全部 (1)