Toward a Benchmark for Controllable Simulation of Imperfect Students with Large Language Models 文章

ArXiv CS.CL2026-05-26NEWSen作者: Alexander Apartsin, Omri Sason, Yehudit Aperstein

摘要

arXiv:2605.25601v1 Announce Type: new Abstract: Teacher education requires deliberate practice with learners who exhibit identifiable strengths, weaknesses, and partial mastery. Large language models could support such practice by simulating students with known skill components, enabling teachers to rehearse explanations, diagnoses, and instructional responses. For this purpose, however, the central requirement is neither to maximize benchmark accuracy nor to suppress isolated facts, but to control model behavior so that it reflects a specified skill profile. This paper investigates whether prompted language models can be steered to retain some skills while suppressing others. We introduce a benchmark-oriented framework in which an explicit skill vector represents a simulated student, prompt-based control specifies retained and missing competencies, and behavior is evaluated using profile-alignment metrics, retained-versus-forgotten comparisons, and cross-skill calibration analyses.