When Can We Trust Early Warnings? Leakage-Excluded Early Outcome Prediction from LMS Interaction Logs 文章

ArXiv CS.AI2026-05-26NEWSen作者: Ngoc Luyen Le, Marie-H\'el\`ene Abel, Bertrand Laforge

摘要

arXiv:2605.25794v1 Announce Type: new Abstract: Early-warning models built from Learning Management System (LMS) logs aim to predict end-of-course outcomes early enough to enable timely learner support. However, reported "early" performance is often inflated by temporal leakage. This occurs when the pipeline uses information that would not yet be available at the time of prediction. We formalize cutoff-based early outcome prediction under a temporal availability constraint and introduce LEAP (Leakage-Excluded Early-Availability Protocol), which enforces cutoff-first truncation prior to joins and aggregation and audits feature provenance to prevent post-cutoff evidence from entering the benchmark. We instantiate LEAP on the public Open University Learning Analytics Dataset (OULAD) as a multi-step protocol for leakage-controlled evaluation across weekly cutoffs. Using several standard learning methods, we evaluate performance using ROC-AUC, PR-AUC, Brier score, and F1@0.5.