AI Cartography: Mapping the Latent Landscape of AI Benchmark Ecosystems 文章

ArXiv CS.AI2026-05-26NEWSen作者: Michael Hardy, Anka Reuel, Lijin Zhang, Jodi M. Casabianca, Sang Truong, Yash Dave, Hansol Lee, Benjamin Domingue, Sanmi Koyejo

查看原文 →

关系图谱

详细信息

来源站点: ArXiv CS.AI
作者: Michael Hardy, Anka Reuel, Lijin Zhang, Jodi M. Casabianca, Sang Truong, Yash Dave, Hansol Lee, Benjamin Domingue, Sanmi Koyejo
文章类型: NEWS
语言: en
发布日期: 2026-05-26

原文

摘要

arXiv:2605.25272v1 Announce Type: new Abstract: While aggregate leaderboard scores drive AI development, they contain substantial measurement noise whose sources and magnitudes remain unquantified, making it unclear when rankings reflect genuine capability differences versus evaluation artifacts. We introduce a framework for measuring the latent landscape in AI benchmark ecosystems. Applying Confirmatory Factor Analysis (CFA) and Generalizability Theory to 4,000+ models from the Open LLM Leaderboard, we decompose sources of ranking variance and establish: (1) structures assumed in current reporting practice underestimate the strength of relationships between benchmarks; (2) evidence of local dependence among leaderboard items, undermining uses of benchmarks as measurement instruments under current scoring systems; (3) contributor metadata explains more rank-relevant variance ($\approx9\%$) than architecture or deployment categories in this context;

AI Cartography: Mapping the Latent Landscape of AI Benchmark Ecosystems 文章

详细信息

摘要

相关事件

相关公司查看全部 (4)

相关人物

相关产品查看全部 (12)

相关技术查看全部 (17)