SEAL: Can Saturated Benchmarks Be Revived by LLM-as-a-Meta-Judge? 事件

Name: SEAL: Can Saturated Benchmarks Be Revived by LLM-as-a-Meta-Judge?
Start: 2026-05-29

PRODUCT_LAUNCH2026-05-29影响: MEDIUM

SEAL: Can Saturated Benchmarks Be Revived by LLM-as-a-Meta-Judge? arXiv:2605.30104v1 Announce Type: new Abstract: Widely used language-model benchmarks are increasingly saturated, with frontier systems often receiving near-tied scores that standard metrics cannot resolve. Rather than constructing harder alternatives, we ask whether existing tasks can be made informative again through improved evaluation over the same candidate outputs. Therefore, we present Seeded Elimination with Adaptive LLM-

人工智能

关系图谱

SEAL: Can Saturated Benchmarks Be Revived by LLM-as-a-Meta-Judge? 事件

相关公司查看全部 (10)

相关人物查看全部 (2)

相关产品查看全部 (10)

相关技术查看全部 (10)

相关报道查看全部 (1)