SCAPO: Self-Supervised Category-Level Articulated Pose Estimation from a Single 3D Observation 文章

ArXiv CS.CV2026-06-02NEWSen作者: Can Zhang, Gim Hee Lee

摘要

arXiv:2606.01940v1 Announce Type: new Abstract: Existing methods for category-level object articulation from a single 3D observation often rely on dense supervision, multi-frame inputs, or CAD templates, and still struggle to disentangle geometry from articulation or to recover explicit joint parameters. We propose SCAPO, a self-supervised framework that estimates canonical geometry, rigid part segmentation, and joint pivots, axes, and articulation states from a single RGB-D observation without ground-truth labels or category-specific models. Our SCAPO first uses an SE(3)-equivariant vector-neuron autoencoder to factor out global pose and align diverse instances into a shared canonical space. On this aligned shape, a joint-aware blend-skinning module is then designed to model part motion.