Keyword Transformer: A Self-Attention Model for Keyword Spotting

Axel Berg, Mark O'Connor, Miguel Tairum Cruz

Forskningsoutput: Kapitel i bok/rapport/Conference proceedingKonferenspaper i proceedingPeer review

Sammanfattning

The Transformer architecture has been successful across many domains, including natural language processing, computer vision and speech recognition. In keyword spotting, self-attention has primarily been used on top of convolutional or recurrent encoders. We investigate a range of ways to adapt the Transformer architecture to keyword spotting and introduce the Keyword Transformer (KWT), a fully self-attentional architecture that exceeds state-of-the-art performance across multiple tasks without any pre-training or additional data. Surprisingly, this simple architecture outperforms more complex models that mix convolutional, recurrent and attentive layers. KWT can be used as a drop-in replacement for these models, setting two new benchmark records on the Google Speech Commands dataset with 98.6% and 97.7% accuracy on the 12 and 35-command tasks respectively.
Originalspråkengelska
Titel på värdpublikationProc. Interspeech 2021
FörlagISCA
Sidor4249-4253
Antal sidor5
DOI
StatusPublished - 2021 aug. 30
EvenemangInterspeech 2021 - Brno, Tjeckien
Varaktighet: 2021 aug. 302021 sep. 3

Publikationsserier

NamnInterspeech
FörlagISCA

Konferens

KonferensInterspeech 2021
Land/TerritoriumTjeckien
OrtBrno
Period2021/08/302021/09/03

Ämnesklassifikation (UKÄ)

  • Signalbehandling
  • Matematik

Fria nyckelord

  • machine learning
  • keyword spotting
  • transformer
  • speech recognition

Fingeravtryck

Utforska forskningsämnen för ”Keyword Transformer: A Self-Attention Model for Keyword Spotting”. Tillsammans bildar de ett unikt fingeravtryck.

Citera det här