Drift-Augmented Scoring: Text-Derived Noise Robustness for Zero-Shot Audio-Language Classification 文章

ArXiv CS.CV2026-06-04NEWSen作者: Tu Vo, Sheir Zaheer, Chan Y. Park

详细信息

来源站点
ArXiv CS.CV
作者
Tu Vo, Sheir Zaheer, Chan Y. Park
文章类型
NEWS
语言
en
发布日期
2026-06-04

摘要

arXiv:2606.04844v1 Announce Type: cross Abstract: Contrastive audio-language models such as CLAP enable zero-shot audio classification: a sound is labelled by matching its embedding to text prompt embeddings, with no labelled audio. This matching breaks down under acoustic noise, where accuracy and mAP fall by 12-30 percentage points at 0 dB SNR on standard benchmarks. We propose Drift Augmented Scoring (DAS), a small per-class bonus added to the cosine score. The bonus rewards a class when the noisy audio embedding drifts in the direction that the class's noise-conditioned text prompts predict. It is derived from text alone, computed once and cached, and adds a single inner product per class at inference, with no gradients and no test-time batch. On a LAION CLAP backbone, we compare DAS against the four variants of Acevedo et al.'s concurrent method on UrbanSound8K and the full FSD50K eval set, mixing each clip with urban acoustic scene noise across a range of SNRs.

相关事件

暂无数据

相关公司查看全部 (1)

L
LAION-AICOMPANY

相关人物

暂无数据