Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model 事件

PRODUCT_LAUNCH2026-05-29影响: MEDIUM

Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model arXiv:2510.27607v3 Announce Type: replace Abstract: Augmenting vision-language-action models (VLAs) with world models is promising for robotic policy learning but faces challenges in jointly predicting states and actions due to the modality gap. To address this, we propose DUal-STream diffusion (DUST), a world-model augmented VLA framework featuring a multimodal diffusion transformer that maintains separate modality st