EgoExo-WM: Unlocking Exo Video for Ego World Models 文章

ArXiv CS.CV2026-05-27NEWSen作者: Danny Tran, Roberto Mart\'in-Mart\'in, Kristen Grauman

摘要

arXiv:2605.15477v2 Announce Type: replace Abstract: Egocentric world models present a promising direction for enabling agents to predict and plan, but their performance is constrained by the limited availability of egocentric training data and its inherent partial observability of humans' physical actions. In contrast, exocentric video is abundant and reveals body poses well, but lacks direct alignment with an agent's action space -- and is not egocentric. We propose a method to bridge this gap by extracting structured body pose from exocentric video as a representation of action and transforming the exocentric video to egocentric video, informed by a human kinematics prior. This process unlocks the integration of in-the-wild exocentric data for egocentric world model training.