GRADE: Generalizable Reasoning-Aware Dialogue Evaluation for AI Tutors 事件

OPEN_SOURCE2026-05-28影响: MEDIUM

GRADE: Generalizable Reasoning-Aware Dialogue Evaluation for AI Tutors arXiv:2605.27866v1 Announce Type: new Abstract: Evaluating AI tutor responses requires more than factual correctness: tutors must identify mistakes, locate errors, provide guidance, and offer actionable next steps. We present GRADE, a systematic study of open-source models for pedagogical ability assessment in student-tutor dialogues. Building on the BEA 2025 TutorMind setting, we evaluate 120 configurations across five lang