DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories 文章

ArXiv CS.CL2026-05-29NEWSen作者: Neemesh Yadav, Palakorn Achananuparp, Jing Jiang, Ee-Peng Lim

摘要

arXiv:2604.20443v2 Announce Type: replace Abstract: We introduce DialToM, an annotated Theory of Mind (ToM) benchmark built from naturalistic human-human dialogues using a multiple-choice evaluation framework. Concurrent with recent work showing a gap between explicit mental-state inference and applied ToM in synthetic settings~\cite{gu2024simpletom}, we establish a stricter \emph{State-Driven Diagnostic Probe} in which models must forecast state-consistent dialogue trajectories solely from isolated mental-state profiles without dialogue context. Our evaluation reveals a systematic reasoning asymmetry -- LLMs excel at inferring mental states (Literal ToM) but struggle to leverage them for social forecasting (Functional ToM). Crucially, a domain expert achieves 100\% accuracy on this task, proving its validity and establishing a stark human-AI capability gap.