WALL-WM: Carving World Action Modeling at the Event Joints 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

WALL-WM: Carving World Action Modeling at the Event Joints arXiv:2606.01955v1 Announce Type: cross Abstract: WALL-WM is a World Action Model that shifts video-action learning from chunk-centric optimization to event-grounded Vision-Language-Action pretraining, using semantically coherent action events as the atomic unit of learning. Existing WAMs commonly initialize from multimodal or video foundation models and then optimize fixed-length action chunks conditioned directly on the current observ