详细信息
- 来源站点
- ArXiv CS.CV
- 作者
- Simen Hexeberg, Fanghui Tong, Hari Vishnu, Mandar Chitre
- 文章类型
- NEWS
- 语言
- en
- 发布日期
- 2026-06-10
摘要
arXiv:2606.10407v1 Announce Type: cross Abstract: Passive acoustic monitoring enables large-scale observation of wildlife, but most bioacoustic classifiers only predict species presence in a time window without localizing vocalizations precisely in time or frequency, limiting downstream analyses. We formulate bird vocalization detection as an object detection task on spectrograms and train YOLO11 models to localize bird calls in dense tropical soundscapes from Singapore. We additionally introduce an open-source browser-based annotation tool and propose Intersection over Minimum (IoMin), an evaluation metric that better handles ambiguous acoustic boundaries than standard IoU and is better suited to the problem at hand. The best YOLO model nearly doubles baseline performance on in-distribution soundscapes from Singapore (81.8% vs. 42.1% IoMin@50 F1-score) while still outperforming the baseline on unseen out-of-distribution recordings from Hawaii (58.6% vs. 48.6%).