Uncovering Competency Gaps in Large Language Models and Their Benchmarks 事件
PRODUCT_LAUNCH2026-06-02影响: MEDIUM
Uncovering Competency Gaps in Large Language Models and Their Benchmarks arXiv:2512.20638v2 Announce Type: replace Abstract: The evaluation of large language models relies heavily on standardized benchmarks. These benchmarks provide useful aggregated metrics, but can obscure (i) particular sub-areas where the models are weak ("model gaps") and (ii) imbalanced coverage in the benchmarks themselves ("benchmark gaps"). To automatically uncover both types of gaps, we propose a simple new method usi
相关产品查看全部 (10)
相关报道查看全部 (1)
Uncovering Competency Gaps in Large Language Models and Their Benchmarks
ArXiv CS.CL2026-06-02