Predicting Perceptual Centers Located at Vowel Onset in German Speech Using Long Short-Term Memory Networks

Felicia Schulz, Mirella De Sisto, M. Paula Roncaglia-Denissen, Peter Hendrix

Research output: Chapter in Book/Report/Conference proceedingPaper in conference proceedingpeer-review

Abstract

Perceptual centers (p-centers) can be defined as the perceived centers of a syllable. Previous research regarding the location of p-centers in speech relied on experimental methods, and among the suggested acoustic features contributing to the location of p-centers in Germanic languages is the transition of the consonant to the vowel onset. The current study investigates the prediction of the location of p-centers in German, by means of machine learning. Machine learning is a promising tool to capture possible non-linear relationships that may occur among the acoustic features used in the complexity that is the human perception. Therefore, an LSTM neural network approach was used for the identification of p-centers in a set of spoken German sentences, with input data features being Mel Frequency Cepstral Coefficients (MFCC), amplitude envelope and root mean squared energy. The model was able to achieve a balanced accuracy of 84% with MFCCs being the best predictor of p-center location.

Original languageEnglish
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Pages1793-1797
Number of pages5
Volume2023-August
DOIs
Publication statusPublished - 2023
Externally publishedYes
Event24th International Speech Communication Association, Interspeech 2023 - Dublin, Ireland
Duration: 2023 Aug 202023 Aug 24

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)2308-457X

Conference

Conference24th International Speech Communication Association, Interspeech 2023
Country/TerritoryIreland
CityDublin
Period2023/08/202023/08/24

Subject classification (UKÄ)

  • Language Technology (Computational Linguistics)

Free keywords

  • deep learning
  • Long Short-Term Memory
  • Mel Frequency Cepstral Coefficients
  • perceptual centers

Fingerprint

Dive into the research topics of 'Predicting Perceptual Centers Located at Vowel Onset in German Speech Using Long Short-Term Memory Networks'. Together they form a unique fingerprint.

Cite this