AnyMo: Scaling Any-Modality Conditional Motion Generation with Masked Modeling 文章

ArXiv CS.CV2026-06-02NEWSen作者: Yiheng Li, Zhuo Li, Ruibing Hou, Yingjie Chen, Hong Chang, Hao Liu, Shiguang Shan

摘要

arXiv:2605.29488v2 Announce Type: replace Abstract: Conditional human motion generation remains a fundamental challenge in computer vision and robotics. Despite significant progress, current methods are often constrained by fixed modality configurations and task-specific architectures, leaving cross-modal interactions and the scaling laws of multimodal-conditioned synthesis largely underexplored. A key bottleneck is the scarcity of large-scale modality-aligned motion data, limiting generalization across diverse control signals. In this work, we introduce OmniHuMo, a large-scale, high-quality dataset comprising over 5,000 hours of motion and 3.2 million sequences with precisely aligned multimodal annotations (e.g., text, speech, music, and trajectory).

相关公司

暂无数据

相关人物

暂无数据

相关技术

暂无数据