Aligning Language Model Benchmarks with Pairwise Preferences 事件

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

Aligning Language Model Benchmarks with Pairwise Preferences arXiv:2602.02898v2 Announce Type: replace-cross Abstract: Language model benchmarks are pervasive and computationally-efficient proxies for real-world performance. However, many recent works find that benchmarks often fail to predict real utility. Towards bridging this gap, we introduce benchmark alignment, where we use limited amounts of information about model performance to automatically update offline benchmarks, aiming to produce