Modulation-based Speech Enhancement

We recently proposed speech enhancement in modulation domain. In the following contributions, speech enhancement results in modulation domain are presented.

M. Blass, P. Mowlaee, B. Kleijn, "Single-Channel Speech Enhancement Using Double Spectrum", in Proc. INTERSPEECH 2016, San Fransisco, USA, September 2016 [Audio].
P. Mowlaee, M. Blass, B. Kleijn, "New Results in Modulation-Domain Single-Channel Speech Enhancement", IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 25, Iss. 11, pp. 2125-2137, Nov. 2017.[Audio].

Below, we present some audio samples demonstrating the impact of the proposed double spectrum speech enhancement versus benchmark methods. The results are shown for the fully blind scenario of male and female utterance corrupted in white and babble noise. For comparison, results of mmsestsa, MMSE-SPU and Modulation Spectral Subtraction (ModSpecSub) methods are shown. Here we present the results of the instrumental predictors for perceived quality and speech intelligibility using PESQ and STOI, respectively:

Female speech: ''The small boy put the worm on the hook'' in babble noise SNR = 0 (dB):

PESQ: Proposed = 2.34, mmsestsa = 2.19, ModSpecSub = 2.17 , MMSE-SPU = 2.15, Noisy = 1.95.

STOI: Proposed = 0.6899, mmsestsa = 0.6912, ModSpecSub = 0.6166 , MMSE-SPU = 0.6682, Noisy = 0.6677.

Male speech: ''She wore warm, fleecy, woolen overalls.'' in babble noise SNR = 0 (dB):

PESQ: Proposed = 2.48, mmsestsa = 2.40, ModSpecSub = 2.38 , MMSE-SPU = 2.20, Noisy = 2.13.

STOI: Proposed = 0.7608, mmsestsa = 0.7576, ModSpecSub = 0.7001 , MMSE-SPU = 0.7154, Noisy = 0.7398.

Female speech: ''The small boy put the worm on the hook'' in white noise SNR = 0 (dB):

PESQ: Proposed = 2.22, mmsestsa = 2.18, ModSpecSub = 2.19 , MMSE-SPU = 2.21, Noisy = 1.68.

STOI: Proposed = 0.6522, mmsestsa = 0.6835, ModSpecSub = 0.6522 , MMSE-SPU = 0.6774, Noisy = 0.6497.

Male speech: ''Her purse was full of useless trash'' in train noise SNR = 0 (dB):

PESQ: Proposed = 1.78, mmsestsa = 1.76, ModSpecSub = 1.62 , MMSE-SPU = 1.38, Noisy = 1.37.

STOI: Proposed = 0.58, mmsestsa = 0.59, ModSpecSub = 0.56 , MMSE-SPU = 0.53, Noisy = 0.58.

Male speech: ''Wipe the grease off his dirty face'' in street noise SNR = 0 (dB):

PESQ: Proposed = 2.03, mmsestsa = 2.03, ModSpecSub = 2.05, MMSE-SPU = 1.82, Noisy = 1.72.

STOI: Proposed = 0.68, mmsestsa = 0.65, ModSpecSub = 0.65, MMSE-SPU = 0.64, Noisy = 0.67.