"Maximum a Posteriori Speech Enhancement Based on Double Spectrum"

Pejman Mowlaee, Daniel Scheran, Johannes Stahl, Sean Wood, Bastiaan Kleijn

- Audio samples -



Below, we present some audio samples demonstrating the impact of the proposed maximum a posteriori (MAP) speech estimator in the double spectrum (DS-MAP) versus benchmark methods. The benchmark methods are the MMSE-STSA method in the acoustic frequency domain [1], modulation spectral subtraction (ModSpecSub) in the Short-Time Spectral Modulation (STSM) domain [2] and DS-Wiener [3].

  • [1] Y. Ephraim and D. Malah, “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 6, pp. 1109–1121, Dec. 1984.

  • [2] K.~Paliwal, K.~Wojcicki, and B.~Schwerin, "Single-channel speech enhancement using spectral subtraction in the short-time modulation domain," speech communication, vol. 52, no. 5, pp. 450 – 475, May 2010.

  • [3] P. Mowlaee, M. Blass, B. Kleijn, "New Results in Modulation-Domain Single-Channel Speech Enhancement", IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 25, Iss. 11, pp. 2125-2137, Nov. 2017.



Female speech in babble noise at SNR = 5 (dB):

Clean
Noisy
ModSpecSub
MMSE-STSA
DS-Wiener
DS-MAP

Male speech in babble noise at SNR = 5 (dB):

Clean
Noisy
ModSpecSub
MMSE-STSA
DS-Wiener
DS-MAP

Female speech in factory noise at SNR = 5 (dB):

Clean
Noisy
ModSpecSub
MMSE-STSA
DS-Wiener
DS-MAP

Male speech in factory noise at SNR = 5 (dB):

Clean
Noisy
ModSpecSub
MMSE-STSA
DS-Wiener
DS-MAP