Agreement Between Large Language Models and Human Raters in Essay Scoring: A Research Synthesis 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

Agreement Between Large Language Models and Human Raters in Essay Scoring: A Research Synthesis arXiv:2512.14561v2 Announce Type: replace Abstract: Despite the growing promise of large language models (LLMs) in automated essay scoring (AES), empirical findings regarding their reliability compared to human raters remain mixed. Following the PRISMA 2020 guidelines, we synthesized 65 published and unpublished studies from January 2022 to August 2025 that examined agreement between LLM-generated sc