Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models 事件
PRODUCT_LAUNCH2026-06-03影响: MEDIUM
Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models arXiv:2606.03988v1 Announce Type: new Abstract: Vision language models (VLMs) excel at many tasks but still struggle with spatial reasoning when critical information is not directly observable. Many such problems require imaginative perception: inferring what would be seen from an unseen viewpoint, tracing paths through occluded spaces, or integrating partial observations into a coherent spatial representation