To Be Multimodal or Not to Be: Query-Adaptive Audio-Visual Person Retrieval via Active Modality Detection 事件

Name: To Be Multimodal or Not to Be: Query-Adaptive Audio-Visual Person Retrieval via Active Modality Detection
Start: 2026-06-05

PRODUCT_LAUNCH2026-06-05影响: MEDIUM

To Be Multimodal or Not to Be: Query-Adaptive Audio-Visual Person Retrieval via Active Modality Detection arXiv:2606.05931v1 Announce Type: cross Abstract: When retrieving a person from a video archive by voice and face, should the system be multimodal or not? In real-world broadcast archives, unlike curated benchmarks, a target may be heard but unseen, seen but unheard, or both. Fusing scores from an absent modality injects noise, degrading precision below the best unimodal system. We propose

人工智能

关系图谱

To Be Multimodal or Not to Be: Query-Adaptive Audio-Visual Person Retrieval via Active Modality Detection · 相关公司

Abstract

arXivNONPROFIT

FrameworkCOMPANY

EATNONPROFIT

ACTNONPROFIT

CastCOMPANY

Ada

Speak

SCORE

VIACOMPANY