详细信息
- 来源站点
- ArXiv CS.CV
- 作者
- Runpei Dong, Ziyan Li, Arjun Gupta, Xialin He, Saurabh Gupta
- 文章类型
- NEWS
- 语言
- en
- 发布日期
- 2026-06-05
摘要
arXiv:2602.16705v3 Announce Type: replace-cross Abstract: Visual loco-manipulation of arbitrary in-the-wild objects requires accurate end-effector (EE) control and a generalizable understanding of the scene from visual inputs (eg, RGB-D images). Existing imitation and sim2real methods jointly learn both these aspects via monolithic end-to-end learning and are thus hard to scale. In this work, we bring to bear the best tools for each of these problems -- large vision models for generalizable scene understanding and simulated training for accurate EE control -- leading to an overall modular loco-manipulation system that exhibits strong generalization. Our core technical innovation is HERO, an accurate residual-aware EE tracking policy made possible by combining classical robotics with machine learning.