Weakly Supervised Detection and Temporal Localization of Whale Calls in Long-Duration Bioacoustic Data 文章

ArXiv CS.AI2026-05-29NEWSen作者: Ragib Amin Nihal, Benjamin Yen, Runwu Shi, Takeshi Ashizawa, Kazuhiro Nakadai

摘要

arXiv:2502.20838v3 Announce Type: replace-cross Abstract: Passive acoustic monitoring (PAM) systems generate continuous recordings spanning months, yet automated bioacoustic analysis of whale calls requires two separate annotation efforts: binary presence labels for classification and precise temporal boundaries for localization. A binary label for a multi-minute recording can be assigned in seconds, but timestamping every call within it requires hours of expert effort. Providing both is infeasible at operational scale. We present DSMIL-LocNet, a weakly supervised multiple instance learning (MIL) framework that performs both classification and temporal localization using only recording-level presence/absence labels. Our dual-stream architecture integrates spectral and temporal features to process recordings of 2--30 minutes without the temporal compression that degrades existing CNN methods on long inputs. On the AcousticTrends BlueFinLibrary, DSMIL-LocNet achieves F1 scores of 0.