Can VLMs Predict Future States? Bootstrapping World Models from Inverse Dynamics 事件

PRODUCT_LAUNCH2026-06-04影响: MEDIUM

Can VLMs Predict Future States? Bootstrapping World Models from Inverse Dynamics arXiv:2506.06006v3 Announce Type: replace Abstract: Can unified vision-language models (VLMs) perform forward dynamics prediction (FDP), i.e., predicting the future state (in image form) given the previous observation and an action (in language form)? We find that VLMs struggle to generate physically plausible transitions between frames from instructions. Nevertheless, we identify a crucial asymmetry in multimodal

Can VLMs Predict Future States? Bootstrapping World Models from Inverse Dynamics · 相关产品