Learning Visual Spatial Planning from Symbolic State via Modality-Gap-Aware Self-Distillation 事件

PRODUCT_LAUNCH2026-06-05影响: MEDIUM

Learning Visual Spatial Planning from Symbolic State via Modality-Gap-Aware Self-Distillation arXiv:2606.06076v1 Announce Type: cross Abstract: While vision-language models excel at general multimodal understanding, they still struggle with visual spatial planning. We attribute this to a perception-reasoning modality gap: visual planning requires models to infer latent state structures from pixels and then reason over the recovered structure to produce valid actions, whereas symbolic planning d