World Models in Words: Auditing Physical State-Transition Commitments in Vision-Language Models 事件

PRODUCT_LAUNCH2026-05-29影响: MEDIUM

World Models in Words: Auditing Physical State-Transition Commitments in Vision-Language Models arXiv:2605.29585v1 Announce Type: new Abstract: Vision-language models (VLMs) are increasingly used to answer questions about physical scenes, yet most evaluations reduce performance to a final answer. This hides whether the model perceived the right objects, represented the right physical state, predicted a plausible transition, or merely selected the right option for the wrong reasons. We introduce