Welcome to Audio Processing Lab

The website presents the recent updates focused on audio signal processing and speech processing topic and the recent advances made by Audio Processing Lab lead by Prof. Pejman Mowlaee. Our recent findings and several audio demos and listening examples are provided for speech enhancement, source separation, robust automatic speech/speaker recognition, speech quality estimation, artificial bandwidth extension. The current webpage is also the companion site for the book "Single Channel Phase-Aware Signal Processing in Speech Communication: Theory and Practice" published at Wiley.


Book at John Wiley and Sons

Abstract: The book presents a complete overview on the challenging new topic of phase-aware signal processing and provides a comprehensive guide to Phase-Aware Signal Processing in Speech Communication. After providing a rich history on phase importance in the literature, the basic problems and required fundamentals of phase processing are described. In the second part of the book, several applications are considered to exemplify the usefulness of phase processing. The book is available for order online at John Wiley & Sons, Inc. with the following links:

Audio demos and some listening examples


Research opportunity at PhaseLab

PhaseLab is seeking to hire research assistants on the topic of Signal Processing for Speech Communication. The positions are embedded in the FWF project P 28070-N33 entiteld "Phase-Aware Signal Processing for Speech Communication" led by Prof. Pejman Mowlaee. For more details see here) or just send your application (CV, motivation letter, list of grades, and names of references) an email to Email.


Recent Events and Some Background

The following events have been organized to address the topic of phase-aware speech signal processing and its applications:
    1. Speech enhancement: The conventional STFT-based single-channel speech enhancement use time-frequency information to reduce the noise. These methods are relying on amplitude modification of the noisy spectral information while the noisy phase is not modified. While less focus has been dedicated to phase processing, researchers more recently have started to investigate the impact of spectral phase information in speech enhancement. For a recent overview on the overview on spectral phase estimation from noisy speech, we refer to "Link".

      Iterative Phase-Aware Speech Enhancement Audio Demos
      Phase Estimation from Noisy Observation Audio Demos
      Modulation-based Speech Enhancement Audio Demos
    2. Single-channel Source separation: Followed by the first speech separation and recognition challenge in 2006, the more realistic scenario was released as SiSEC and CHiME in 2011 where additional realistic background noise were added to GRID sentences recorded in a reverberant environment. In the following contributions, we investigated the trade-off between a speech enhancement and noise estimation and speaker-dependent source separation algorithm. For a recent overview on source separation results we refer to "Link".
    3. Artificial bandwidth extension: Due to the limited bandwidth of telephony speech, extension of narrowband to wideband signal is of great importance in terms of improving the quality and intelligibility of speech signal. We present results for the following paper, and compare them with the benchmark method Link.


    4. Automatic speech recognition: The spectral phase information has been conventionally ignored in automatic speech recognition. However, several recent works have demonstrated the usefulness of phase-derived features, e.g., group delay and its modified version and its combination with the conventional MFCC features to improve the resulting ASR performance. We have recently demonstrated the importance of phase-aware signal processing in automatic speech recognition application Link.


    5. Speaker Recognition: We have developed several robust speech recognition methods where either robust features or enhanced signal-level speech signal was provided for the speaker recognizer Link.


    6. Speech quality estimation: The issue of evaluating the performance of a new signal enhancement system is time-consuming and tedious, therefore, availability of a reliable instrumental metric for predicting the subjective listening results is of great importance in the development or implementation of a new method. Here we present contributions focused on studying the reliability of the existing metrics in prediction of perceived quality or speech intelligibility achievable of a single-channel signal enhancement method. For a recent overview on the speech quality estimation we refer to "Link".