Frontier Lag: A Bibliometric Audit of Capability Misrepresentation in Academic AI Evaluation 文章

ArXiv CS.CL2026-06-05NEWSen作者: David Gringras, Misha Salahshoor

详细信息

来源站点: ArXiv CS.CL
作者: David Gringras, Misha Salahshoor
文章类型: NEWS
语言: en
发布日期: 2026-06-05

摘要

arXiv:2605.04135v2 Announce Type: replace-cross Abstract: Readers of applied-domain LLM capability evaluations want to know what AI systems can currently do. That literature answers a related, but consequentially different, question: what older, cheaper, less-elicited models could do months or years earlier (a 2026 paper evaluating GPT-3.5 or GPT-4 zero-shot, say, against a frontier of reasoning-capable, tool-using systems like GPT-5.5 Pro and Claude Opus 4.7), often reported with sparse configuration details and abstracted upward into claims about "AI" that propagate through citations, media, and policy. We measure the 'publication elicitation gap' (the gap between these answers) in a pre-registered audit of 112,303 LLM-keyword-matched candidate records (2022-01 to 2026-04; 18,574 admissible, 4,766 full-paper texts retrievable), comparing tested models to the contemporaneous frontier on the Epoch AI Capabilities Index (ECI), reproduced under Arena Elo and Artificial Analysis.

Frontier Lag: A Bibliometric Audit of Capability Misrepresentation in Academic AI Evaluation 文章

详细信息

摘要

相关事件

相关公司查看全部 (1)

相关人物

相关产品查看全部 (7)

相关技术查看全部 (2)