Frontier Lag: A Bibliometric Audit of Capability Misrepresentation in Academic AI Evaluation 文章

ArXiv CS.CL2026-06-05NEWSen作者: David Gringras, Misha Salahshoor

详细信息

来源站点
ArXiv CS.CL
作者
David Gringras, Misha Salahshoor
文章类型
NEWS
语言
en
发布日期
2026-06-05

摘要

arXiv:2605.04135v2 Announce Type: replace-cross Abstract: Readers of applied-domain LLM capability evaluations want to know what AI systems can currently do. That literature answers a related, but consequentially different, question: what older, cheaper, less-elicited models could do months or years earlier (a 2026 paper evaluating GPT-3.5 or GPT-4 zero-shot, say, against a frontier of reasoning-capable, tool-using systems like GPT-5.5 Pro and Claude Opus 4.7), often reported with sparse configuration details and abstracted upward into claims about "AI" that propagate through citations, media, and policy. We measure the 'publication elicitation gap' (the gap between these answers) in a pre-registered audit of 112,303 LLM-keyword-matched candidate records (2022-01 to 2026-04; 18,574 admissible, 4,766 full-paper texts retrievable), comparing tested models to the contemporaneous frontier on the Epoch AI Capabilities Index (ECI), reproduced under Arena Elo and Artificial Analysis.