BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution arXiv:2606.01286v1 Announce Type: cross Abstract: The rapid progress of frontier large language models has led to widespread benchmark saturation, limiting the ability of existing datasets to differentiate model capabilities or provide useful training signal. For instance, on LiveCodeBench, frontier models achieve over 99% Pass@1 on easy splits and exceed 90% Pass@1 on average across difficulty levels. Constructing new, challe

BenchEvolver: Frontier Task Synthesis via Solution-Centric Evolution · 相关人物