3DVLA: Enhancing Vision-Language-Action Models via 3D Spatial and Instance Understanding 文章

ArXiv CS.CV2026-05-29NEWSen作者: Zhongyu Xia, Yousen Tang, Bingqing Wei, Yongtao Wang

3DVLA: Enhancing Vision-Language-Action Models via 3D Spatial and Instance Understanding · 相关技术