SalsaAgent: A multimodal embodied language model for interactive dance generation 文章

ArXiv CS.CV2026-05-29NEWSen作者: Payam Jome Yazdian, Zoe Stanley, Angelica Lim

摘要

arXiv:2605.29219v1 Announce Type: new Abstract: Interaction between humanoids involves bidirectional and nonverbal reactivity, coordination and synchrony. Toward socially aware robots and interactive virtual agents, we present SalsaAgent, a language model that generates expressive, full-body salsa dance motions in reaction to a human leader and against a contextual music backdrop. We formulate interaction as nonverbal motion token passing, extending the vocabulary of a large language model (LLM) to process discrete motion tokens, pairwise relation tokens, and audio. Our contributions include new tokens for full-body and motion relations, LLM fine-tuning using automatically derived text descriptions of skeleton dynamics for token grounding, and a two-stage token-to-diffusion pipeline.

相关公司

暂无数据

相关人物

暂无数据