Automated Essay Scoring and Language Certification: Assessing Generalizability, Agreement and Validity for French 文章

ArXiv CS.CL2026-06-02NEWSen作者: Rodrigo Wilkens, R\'emi Cardon, Vincent Folny, Thomas Fran\c{c}ois

摘要

arXiv:2606.02009v1 Announce Type: new Abstract: In Automated Essay Scoring (AES), benchmarking practices have fostered minimalist evaluation practices, in contrast with the broader-view recommendations of evaluation frameworks, such as the argument-based validation framework (ABV), which argued in favor of a multidimensional assessment of systems, especially in the context of high-stakes language tests. In this paper, we introduce an enhanced and more practical version of the ABV framework, incorporating fairness analysis, correlations with linguistic features, prediction error evaluation, and model agreement compared with human raters. Applying this framework to French AES, we compare 8 model architectures on a corpus of 27k exam essays (2 raters each) and a generalization corpus of 961 essays (at least nine raters each).

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据