EigeNet: Geometry-Informed Multi-Modal Learning for Few-shot Novel View RIR Prediction 文章

ArXiv CS.AI2026-05-28NEWSen作者: Chong Jing, Zitong Lan, Junan Zhang, Zhizheng Wu

摘要

arXiv:2605.28101v1 Announce Type: cross Abstract: Predicting spatially varying Room Impulse Response (RIR) from sparse observations is a critical but highly challenging inverse problem for immersive spatial audio rendering. In this work, we present EIGENET, a geometry-informed multi-modal framework for few-shot novel view RIR prediction. At its core is a Cross-view Alternate-attention Transformer that iteratively refines local intra-view acoustic structures and global cross-view spatial relationships. We empirically demonstrate that this architecture is capable of making full use of the multi-view multi-modal context while performing spatial-temporal reasoning for RIR prediction. Inspired by acoustic ray tracing, we design a geometry-informed modulation block to formulate the connection between geometric features and RIR power spectrum. In the mean time, an auxiliary loss is introduced to transform the single-target waveform prediction into a multi-task learning framework.