Phase estimation in single-channel speech enhancement using phase invariance constraints

Michael Pirolt ; Johannes Stahl ; Pejman Mowlaee ; Vasili I. Vorobiov ; Siarhei Y. Barysenka ; Andrew G. Davydov

Phase-aware signal processing has received increasing interest in many speech applications. The success of phase-aware processing depends strongly on the robustness of the clean spectral phase estimates to be obtained from a noisy observation. In this paper, we propose a novel harmonic phase estimator relying on the phase quasi invariance (PQI) property exploiting relations between harmonics using the phase structure. We present speech quality results achieved in speech enhancement to justify the effectiveness of the proposed phase estimator in [1] compared to noisy phase and other phase estimation benchmarks.
Below, we present some audio samples demonstrating the speech enhancement results obtained by the PQI-based phase estimator. As benchmarks, we also report results obtained by Maximum a Posteriori (MAP) phase estimator [2] and Short-time Fourier Transform phase improvement (STFTPI) [3] as well as clean phase upper-bound.

[1] M. Pirolt, J. Stahl, P. Mowlaee, V. Vorobiov, S. Barysenka, A. Davydov, "Phase Estimation in Single-Channel Speech Enhancement Using Phase Invariance Constraints", ICASSP, pp. 5585-5589, 2017.
[2] J. Kulmer P. Mowlaee "Harmonic Phase Estimation in Single-Channel Speech Enhancement Using Von Mises Distribution and Prior SNR" Proc. IEEE Int. Conf. Acoust. Speech Signal Processing pp. 5063-5067 Apr. 2015.
[3] M. Krawczyk T. Gerkmann "STFT Phase Reconstruction in Voiced Speech for an Improved Single-Channel Speech Enhancement" IEEE Trans. Audio Speech and Language Process. vol. 22 no. 12 pp. 1931-1940 Dec. 2014.

Here we present some instrumental prediction for perceived quality and speech intelligibility using the instrumental measures PESQ and STOI, respectively:

- Audio samples -

Female speech: ''Lay Green at P One Please'' in babble noise at SNR = 5 (dB):

PESQ: Noisy = 1.57, Proposed (blind f0) = 1.92.
STOI: Noisy = 0.71, Proposed (blind f0) = 0.76.

Female speech: ''Play Greent at P One Please'' in white noise at SNR = 5 (dB):

PESQ: Noisy = 1.36, Proposed (blind f0) = 1.68
STOI: Noisy = 0.72, Proposed (blind f0) = 0.76

Phase estimation in single-channel speech enhancement using phase invariance constraints

Michael Pirolt ; Johannes Stahl ; Pejman Mowlaee ; Vasili I. Vorobiov ; Siarhei Y. Barysenka ; Andrew G. Davydov

- Audio samples -

Female speech: ''Lay Green at P One Please'' in babble noise at SNR = 5 (dB):

Female speech: ''Play Greent at P One Please'' in white noise at SNR = 5 (dB):