Few-Shot Bioacoustic Event Detection Using an Event-Length Adapted Ensemble of Prototypical Networks

John Martinsson, Maria Sandsten, Martin Willbo, Aleksis Pirinen, Olof Mogren

Research output: Chapter in Book/Report/Conference proceedingPaper in conference proceedingpeer-review

Abstract

In this paper we study two major challenges in few-shot bioacoustic event detection: variable event lengths and false-positives. We use prototypical networks where the embedding function is trained using a multi-label sound event detection model instead of using episodic training as the proxy task on the provided training dataset. This is motivated by polyphonic sound events being present in the base training data. We propose a method to choose the embedding function based on the average event length of the few-shot examples and show that this makes the method more robust towards variable event lengths. Further, we show that an ensemble of prototypical neural networks trained on different training and validation splits of time-frequency images with different loudness normalizations reduces false-positives. In addition, we present an analysis on the effect that the studied loudness normalization techniques have on the performance of the prototypical network ensemble. Overall, per-channel energy normalization (PCEN) outperforms the standard log transform for this task. The method uses no data augmentation and no external data. The proposed approach achieves a F-score of 48.0% when evaluated on the hidden test set of the Detection and Classification of Acoustic Scenes and Events (DCASE) task 5.
Original languageEnglish
Title of host publicationProceedings of the 7th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2022)
Number of pages5
ISBN (Electronic)978-952-03-2677-7
Publication statusPublished - 2022
Event7th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2022) - Nancy, France
Duration: 2022 Nov 32022 Nov 4

Conference

Conference7th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2022)
Abbreviated titleDCASE2022
Country/TerritoryFrance
CityNancy
Period2022/11/032022/11/04

Subject classification (UKÄ)

  • Probability Theory and Statistics
  • Signal Processing

Keywords

  • machine listening
  • bioacoustics
  • few-shot learning
  • ensemble

Fingerprint

Dive into the research topics of 'Few-Shot Bioacoustic Event Detection Using an Event-Length Adapted Ensemble of Prototypical Networks'. Together they form a unique fingerprint.

Cite this