Phase estimation in single-channel speech enhancement using phase invariance constraints

Michael Pirolt ; Johannes Stahl ; Pejman Mowlaee ; Vasili I. Vorobiov ; Siarhei Y. Barysenka ; Andrew G. Davydov

Phase-aware signal processing has received increasing interest in many speech applications. The success of phase-aware processing depends strongly on the robustness of the clean spectral phase estimates to be obtained from a noisy observation. In this paper, we propose a novel harmonic phase estimator relying on the phase quasi invariance (PQI) property exploiting relations between harmonics using the phase structure. We present speech quality results achieved in speech enhancement to justify the effectiveness of the proposed phase estimator in [1] compared to noisy phase and other phase estimation benchmarks.
Below, we present some audio samples demonstrating the speech enhancement results obtained by the PQI-based phase estimator. As benchmarks, we also report results obtained by Maximum a Posteriori (MAP) phase estimator [2] and Short-time Fourier Transform phase improvement (STFTPI) [3] as well as clean phase upper-bound.

Here we present some instrumental prediction for perceived quality and speech intelligibility using the instrumental measures PESQ and STOI, respectively:

- Audio samples -

Female speech: ''Lay Green at P One Please'' in babble noise at SNR = 5 (dB):

PESQ: Noisy = 1.57, Proposed (blind f0) = 1.92.

STOI: Noisy = 0.71, Proposed (blind f0) = 0.76.

Female speech: ''Play Greent at P One Please'' in white noise at SNR = 5 (dB):

PESQ: Noisy = 1.36, Proposed (blind f0) = 1.68

STOI: Noisy = 0.72, Proposed (blind f0) = 0.76