Geometry-Aware Implicit Memory for Video World Models 文章

ArXiv CS.CV2026-06-02NEWSen作者: Zhengxuan Wei, Xu Guo, Xinghui Li, Xunzhi Xiang, Min Wei, Yiran Zhu, Qiulin Wang, Xintao Wang, Pengfei Wan, Xiangwang Hou, Qi Fan

摘要

arXiv:2606.02436v1 Announce Type: new Abstract: Video world models aim to simulate controllable visual environments, but long-horizon rollouts depend on what the model remembers after observations leave its native context window. Explicit memories retain frames or online 3D reconstructions, which can suffer from heuristic retrieval errors, redundant appearance storage, or reconstruction artifacts. Implicit memories compress history into a compact state, but existing designs are not explicitly constrained to encode cross-view scene geometry. We propose GIM-World, a geometry-aware implicit memory framework for video world models. A lightweight transformer encoder compresses variable-length history into fixed-size memory tokens, a camera-queryable geometry head distills 3D scene structure from a frozen foundation model into the memory during training, and an information-guided pruning rule keeps encoding cost bounded as history grows.

相关事件查看全部 (1)

Geometry-Aware Implicit Memory for Video World Models
2026-06-02PRODUCT_LAUNCH影响: MEDIUM

相关公司

暂无数据

相关人物

暂无数据