摘要
arXiv:2605.29339v1 Announce Type: new Abstract: With the rapid advancement of multimodal large language models (MLLMs), models have demonstrated increasingly powerful multimodal capabilities. However, whether MLLMs trained through statistical learning can truly understand the causal relationships underlying the real world remains a key research question. In recent years, numerous multimodal causal reasoning datasets have been proposed. Nevertheless, these datasets are either limited in scale or constructed from synthetic images and videos, cartoon-based content, or other non-realistic multimodal sources. To address these limitations, we collect real-world videos and construct DMC-CF-Static, a large-scale benchmark for multimodal causal counterfactual reasoning.
相关事件查看全部 (1)
相关公司
暂无数据
相关人物
暂无数据