Extending GCC-PHAT using Shift Equivariant Neural Networks

Forskningsoutput: Kapitel i bok/rapport/Conference proceedingKonferenspaper i proceedingPeer review

Sammanfattning

Speaker localization using microphone arrays depends on accurate time delay estimation techniques. For decades, methods based on the generalized cross correlation with phase transform (GCC-PHAT) have been widely adopted for this purpose. Recently, the GCC-PHAT has also been used to provide input features to neural networks in order to remove the effects of noise and reverberation, but at the cost of losing theoretical guarantees in noise-free conditions. We propose a novel approach to extending the GCC-PHAT, where the received signals are filtered using a shift equivariant neural network that preserves the timing information contained in the signals. By extensive experiments we show that our model consistently reduces the error of the GCC-PHAT in adverse environments, with guarantees of exact time delay recovery in ideal conditions.
Originalspråkengelska
Titel på värdpublikationProceedings of the Annual Conference of the International Speech Communication Association 2022
FörlagISCA
Sidor1791-1795
Antal sidor5
DOI
StatusPublished - 2022
EvenemangInterspeech 2022 - Incheon, Sydkorea, Republiken Korea
Varaktighet: 2022 sep. 182022 sep. 22

Publikationsserier

NamnInterspeech
FörlagISCA

Konferens

KonferensInterspeech 2022
Land/TerritoriumSydkorea, Republiken Korea
OrtIncheon
Period2022/09/182022/09/22

Ämnesklassifikation (UKÄ)

  • Signalbehandling

Fingeravtryck

Utforska forskningsämnen för ”Extending GCC-PHAT using Shift Equivariant Neural Networks”. Tillsammans bildar de ett unikt fingeravtryck.

Citera det här