From Correspondence to Actions: Human-Like Multi-Image Spatial Reasoning in Multi-modal Large Language Models 事件
PRODUCT_LAUNCH2026-06-11影响: MEDIUM
From Correspondence to Actions: Human-Like Multi-Image Spatial Reasoning in Multi-modal Large Language Models arXiv:2602.08735v3 Announce Type: replace Abstract: While multimodal large language models (MLLMs) have made substantial progress in single-image spatial reasoning, multi-image spatial reasoning, which requires integration of information from multiple viewpoints, remains challenging. Cognitive studies suggest that humans address such tasks through two mechanisms: cross-view corresponden