Self-supervised Hierarchical Visual Reasoning with World Model 文章

ArXiv CS.AI2026-05-26NEWSen作者: Yuanfei Xu, Lin Liu, Wengang Zhou, Mingxiao Feng, Houqiang Li

摘要

arXiv:2605.17537v2 Announce Type: replace Abstract: 3D open-world environments with adversarial opponents remain a core challenge for reinforcement learning due to their vast state spaces. Effective reasoning representations are essential in such settings. While existing self-supervised visual foresight reasoning approaches often suffer from multi-step error accumulation, many recent studies resort to injecting domain-specific knowledge for more stable guidance. Our key insight is that the photorealistic fidelity of visual reasoning representations is secondary; what truly matters is providing informative, task-relevant signals. To this end, we propose ResDreamer, a hierarchical world model in which each higher-level layer is trained to reconstruct the residuals of the layer below. This design enables progressive abstraction of increasingly sophisticated world dynamics and fosters the emergence of richer latent representations.