摘要
arXiv:2604.20443v2 Announce Type: replace Abstract: We introduce DialToM, an annotated Theory of Mind (ToM) benchmark built from naturalistic human-human dialogues using a multiple-choice evaluation framework. Concurrent with recent work showing a gap between explicit mental-state inference and applied ToM in synthetic settings~\cite{gu2024simpletom}, we establish a stricter \emph{State-Driven Diagnostic Probe} in which models must forecast state-consistent dialogue trajectories solely from isolated mental-state profiles without dialogue context. Our evaluation reveals a systematic reasoning asymmetry -- LLMs excel at inferring mental states (Literal ToM) but struggle to leverage them for social forecasting (Functional ToM). Crucially, a domain expert achieves 100\% accuracy on this task, proving its validity and establishing a stark human-AI capability gap.
相关事件查看全部 (1)
相关公司
暂无数据
相关人物
暂无数据