TeachObs: A Human-Validated Benchmark for Multimodal Teaching Observation and Model Evaluation 文章

ArXiv CS.CL2026-06-01NEWSen作者: Yeil Jeong, Youngjin Yoo, Seobin Sohn, Hyejin Han, Jinseo Lee, Scott Howard, Unggi Lee

摘要

arXiv:2605.30673v1 Announce Type: new Abstract: Classroom videos contain observable teaching practices, but their pedagogical and visual signals are rarely organized in forms suitable for model evaluation. We present \textit{TeachObs}, a human-validated benchmark for multimodal teaching observation in classroom videos. \textit{TeachObs} includes 30 public lesson videos from eight countries divided into 5,158 fixed 15-second scenes. Seven researchers annotated each scene with 39 binary observation codes, covering 20 visual codes, such as gesture, board work, pointing, and visual materials, and 19 nonvisual codes, such as instruction, monitoring, questioning, feedback, and reflection. Gold segment labels are constructed using reliability- and prevalence-aware rules based on Krippendorff's alpha.