Sammanfattning

This technical report gives an overview of our submission to task 3 of the DCASE 2024 challenge. We present a sound event localization and detection (SELD) system using input features based on trainable neural generalized cross-correlations with phase transform (NGCC-PHAT). With these features together with spectrograms as input to a Transformer-based network, we achieve significant improvements over the baseline method. In addition, we also present an audio-visual version of our system, where distance predictions are updated using depth maps from the panorama video frames.
Originalspråkengelska
StatusPublished - 2024 juni 30

Ämnesklassifikation (UKÄ)

  • Datorseende och robotik (autonoma system)

Fingeravtryck

Utforska forskningsämnen för ”THE LU SYSTEM FOR DCASE 2024 SOUND EVENT LOCALIZATION AND DETECTION CHALLENGE”. Tillsammans bildar de ett unikt fingeravtryck.

Citera det här