Johannes Stahl's page

This page presents an overview on the research I conducted from 2016 to 2019 at the Signal Processing and Speech Communication Laboratory, Graz University of Technology. Most algorithms I developed within this time span are dedicated to the single-channel speech enhancement problem, more specifically to non data-driven, STFT-based, and low-ressource approaches to solving this problem. The focus of my work was on the STFT phase and its role in such algorithms. You can find implementations of the methods presented in my publications on the gitlab repository that is linked below.
You may also have a look at my
PhD thesis, which summarizes my work and which I successfully defended on 06.02.2019.


In case you have questions regarding my work, feel free to contact me:

Johannes Stahl
johannes.kw.stahl (at) gmail.com
gitlab
google scholar



Sinusoidal signal modeling for STFT-based speech enhancement algorithms

We were interested in possibilities to incorporate harmonic signal modeling into speech enhancement algorithms. First, we implemented an expectation-maximization algorithm for estimating harmonic signal parameters from observed, noise-corrupted speech data. The estimated parametric representation of the speech signal was subsequently used to reconstruct voiced speech segments. We further derived and implemented a simultaneous detection-estimation framework for speech enhancement that relies on the harmonic plus noise model for speech. As an alternative to the expectation-maximization algorithm that we proposed before, the harmonic signal (parameter) estimation step is performed in a pitch-synchronous analysis stage, successfully circumventing the need to iteratively estimate the set of harmonic signal parameters.

Both methods successfully reconstruct voiced speech segments and the detection-estimation approach also handles unvoiced segments effectively.

Publications:

J. Stahl and P. Mowlaee
A pitch-synchronous simultaneous detection-estimation framework for speech enhancement
IEEE/ACM Trans. Audio, Speech, and Language Processing, vol. 26, no. 2, pp. 436 - 450, Feb 2018


Paper | Implementation | Audio

J. Stahl and P. Mowlaee
Iterative harmonic speech enhancement
in Proc. ITG Symposium on Speech Communication, Oct 2016, pp. 1 - 5.


Paper


Correlations w.r.t. frequency in STFT-based speech enhancement algorithms

We investigated spectral correlations in speech signals and derived and implemented algorithms that effectively take them into account. The key problem to solve was to estimate the second order statistics of the speech and the noise signal. As the problem is aggravated by the need to also estimate phase correlations, we propose a statistical model that circumvents the need to estimate them. We analyzed the validity of this model and implemented an algorithm for speech enhancement relying on this model.

The proposed method effectively suppresses noise while keeping the speech distortions very low.

Publications:

J. Stahl, S. Wood, and P. Mowlaee
Single-channel speech enhancement with correlated spectral components: Limits - potential
submitted to IEEE/ACM Trans. Audio, Speech, and Language Processing, 2018.


Implementation | Audio

J. Stahl, S. Wood, and P. Mowlaee We analyzed spectral correlations in speech signals and how to effectively
Overcoming covariance matrix phase sensitivity in single-channel speech enhancement with correlated spectral components
in Proc. ITG Symposium on Speech Communication, Oct 2018, pp. 286 - 290.


Paper | Implementation | Audio


Correlations w.r.t. time in STFT-based speech enhancement algorithms

We analyzed inter-frame correlations in STFT representations of speech signals. Using the obtained insights, we derived an algortihm for speech PSD/SNR estimation as well a subband Kalman fitler for speech enhancement. The Kalman filter parameters are estimated from the spectral phase's statistics.

Publications:

J. Stahl and P. Mowlaee
Exploiting temporal correlation in pitch-adaptive speech enhancement
Speech Communication, vol. 111, pp. 1 - 13, 2019.


Implementation | Audio

J. Stahl and P. Mowlaee
A simple and effective framework for a priori SNR estimation
in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 2018, pp. 564 - 5648.


Paper | Implementation | Audio



Other Publications

P. Mowlaee, D. Scheran, J. Stahl, S. Wood, and W. B. Kleijn
Maximum a posteriori speech enhancement based on double spectrum
to appear in Proc. Interspeech, 2019.

P. Mowlaee, J. Stahl, and J. Kulmer
Iterative joint MAP single-channel speech enhancement given non-uniform phase prior
Speech Communication, vol. 86, pp. 85 - 96, 2017.

M. Pirolt, J. Stahl, P. Mowlaee, V. I. Vorobiov, S. Y. Barysenka, and A. G. Davydov
Phase estimation in single-channel speech enhancement using phase invariance constraints
in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, March 2017, pp. 5585 - 5589.

P. Mowlaee, J. Kulmer, J. Stahl, and F. Mayer
Single Channel Phase-Aware Signal Processing in Speech Communication: Theory and Practice
John Wiley & Sons, Ltd, 2016.

J. Fahringer, T. Schrank, J. Stahl, P. Mowlaee, and F. Pernkopf
Phase-aware signal processing for automatic speech recognition
in Proc. Interspeech, 2016, pp. 3374 - 3378.

J. Stahl, P. Mowlaee, and J. Kulmer
Phase-processing for voice activity detection: A statistical approach
in Proc. European Signal Processing Conf., Aug 2016, pp. 1202 - 1206.