To Be Multimodal or Not to Be: Query-Adaptive Audio-Visual Person Retrieval via Active Modality Detection 事件

PRODUCT_LAUNCH2026-06-05影响: MEDIUM

To Be Multimodal or Not to Be: Query-Adaptive Audio-Visual Person Retrieval via Active Modality Detection arXiv:2606.05931v1 Announce Type: cross Abstract: When retrieving a person from a video archive by voice and face, should the system be multimodal or not? In real-world broadcast archives, unlike curated benchmarks, a target may be heard but unseen, seen but unheard, or both. Fusing scores from an absent modality injects noise, degrading precision below the best unimodal system. We propose

To Be Multimodal or Not to Be: Query-Adaptive Audio-Visual Person Retrieval via Active Modality Detection · 相关公司

A
arXivNONPROFIT
F
FrameworkCOMPANY
E
EATNONPROFIT
A
ACTNONPROFIT
C
CastCOMPANY
V
VIACOMPANY