Audio-Visual World Models: Grounding Multisensory Imagination for Embodied Agents 事件
PRODUCT_LAUNCH2026-06-08影响: MEDIUM
Audio-Visual World Models: Grounding Multisensory Imagination for Embodied Agents arXiv:2512.00883v3 Announce Type: replace-cross Abstract: World models simulate environmental dynamics to enable agents to plan and reason about future states. While existing approaches have primarily focused on visual observations, real-world perception inherently involves multiple sensory modalities. Audio provides crucial spatial and temporal cues such as sound source localization and acoustic scene properties,
Audio-Visual World Models: Grounding Multisensory Imagination for Embodied Agents · 相关报道
相关报道
Audio-Visual World Models: Grounding Multisensory Imagination for Embodied Agents
ArXiv CS.CV2026-06-08