LiveK12Bench: Have Large Multimodal Models Truly Conquered High School-level Examinations? 文章

ArXiv CS.AI2026-05-27NEWSen作者: Xiaohan Wang, Mingze Yin, Yilin Zhao, Gang Liu, Dian Li

详细信息

来源站点: ArXiv CS.AI
作者: Xiaohan Wang, Mingze Yin, Yilin Zhao, Gang Liu, Dian Li
文章类型: NEWS
语言: en
发布日期: 2026-05-27

摘要

arXiv:2605.26781v1 Announce Type: new Abstract: Advanced Large Multimodal Models (LMMs) have demonstrated impressive performance in K-12 reasoning tasks, exhibiting great promise as intelligent tutors. Realizing this potential requires models to navigate real-world examinations effectively, yet most existing benchmarks fail to capture the complexity of authentic testing environments. Specifically, most datasets are static, prone to data contamination, and are often confined to restricted modalities, disciplines, and evaluation criteria. To address these issues, we introduce LiveK12Bench, a dynamic, holistic, multi-disciplinary benchmark designed to evaluate the reasoning abilities of LMMs in realistic examination scenarios. LiveK12Bench comprises 2K+ verified questions spanning Mathematics, Physics, Chemistry, and Biology, sourced from the latest real-world exam papers and designed to grow over time.

LiveK12Bench: Have Large Multimodal Models Truly Conquered High School-level Examinations? 文章

详细信息

摘要

相关事件

相关公司查看全部 (3)

相关人物

相关产品查看全部 (7)

相关技术查看全部 (21)