Hierarchical Semantic-Constrained Heterogeneous Graph for Audio-Visual Event Localization 事件

PRODUCT_LAUNCH2026-06-08影响: MEDIUM

Hierarchical Semantic-Constrained Heterogeneous Graph for Audio-Visual Event Localization arXiv:2606.07033v1 Announce Type: cross Abstract: Open-vocabulary audio-visual event localization (OV-AVEL) jointly models audio-visual cues to recognize and temporally localize events, including categories unseen during training. Existing methods primarily learn joint audio-visual representations in Euclidean space, but still face two significant challenges. First, the lack of supervision signals for unse

Hierarchical Semantic-Constrained Heterogeneous Graph for Audio-Visual Event Localization · 相关人物