"Single-Channel Speech Enhancement with Correlated Spectral Components: Limits-Potential"

Pejman Mowlaee, Johannes K.W. Stahl


- Audio samples -


Below, we present some audio samples demonstrating the impact of the linear multidimensional MMSE STSA (LMDSTSA) estimator that takes into account correlated spectral components as proposed in P. Mowlaee and J. Stahl “Single-Channel Speech Enhancement with Correlated Spectral Components: Limits - Potential,” submitted to Speech Communication. We also present the outcome of applying the standard Wiener filter and the algorithm presented in [2] for comparison. The audio samples consist of utterances spoken by male and female speakers, corrupted in different noise types.

[1] P. Mowlaee and J. Stahl “Single-Channel Speech Enhancement with Correlated Spectral Components: Limits - Potential,” submitted to Speech Communication.

[2] E. Plourde and B. Champagne, “Multidimensional STSA estimators for speech enhancement with correlated spectral components,” IEEE Trans. Signal Process., vol. 59, no. 7, pp. 3013 - 3024, July 2011

Female speaker in a bus as an example for a real-world scenario. The recording was part of the Chime 4 challenge:
Emmanuel Vincent, Shinji Watanabe, Aditya Arie Nugraha, Jon Barker, and Ricard Marxer "An analysis of environment, microphone and data simulation mismatches in robust speech recognition", Computer Speech and Language, 2016.:

Female speaker in a cafe as an example for a real-world scenario. The recording was part of the Chime 4 challenge:
Emmanuel Vincent, Shinji Watanabe, Aditya Arie Nugraha, Jon Barker, and Ricard Marxer "An analysis of environment, microphone and data simulation mismatches in robust speech recognition", Computer Speech and Language, 2016.:

Male speaker: ''The carpet cleaners shampooed our oriental rug.'' in factory noise, SNR = 10 dB:

Male speaker: ''However, the litter remained, augmented by several dozen lunchroom suppers.'' in babble noise, SNR = 0 dB:

Female speaker: ''To further his prestige, he occasionally reads the Wall Street Journal.'' in babble noise, SNR = 10 dB:

Male speaker: ''Are your grades higher or lower than Nancy's?'' in factory noise, SNR = 0 dB:

Female speaker: ''We always thought we would die with our boots on.'' in factory noise, SNR = 10 dB: