Why Far Looks Up: Probing Spatial Representation in Vision-Language Models 事件

PRODUCT_LAUNCH2026-05-29影响: MEDIUM

Why Far Looks Up: Probing Spatial Representation in Vision-Language Models arXiv:2605.30161v1 Announce Type: new Abstract: Vision-language models (VLMs) achieve strong performance on spatial reasoning benchmarks, yet it remains unclear whether this reflects structured 3D understanding or reliance on statistical shortcuts in natural images. We introduce a representation-level analysis framework that constructs minimal contrastive pairs to measure how spatial axes are organized and disentangled w