JudgmentBench: Comparing Rubric and Preference Evaluation for Quality Assessment 事件

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

JudgmentBench: Comparing Rubric and Preference Evaluation for Quality Assessment arXiv:2605.25240v1 Announce Type: new Abstract: Two methodologies dominate current practices of benchmarking: rubric-based scoring evaluates items against predefined criteria, whereas comparative judgment elicits pairwise preferences between outputs. Although both methodologies are widely used, the choice between them is rarely justified. We release JudgmentBench, a benchmark of 30 real-world legal tasks, paired wi