DESCRIPTION

    


The Short-time Chirp Transform
and its application on speech signals

1. Speech model

As we already know, the general model of speech is defined as:

where h(t) is the impulse response of the vocal tract and p(t) represents the serious of pulses generated by vocal folds. In normal conversation when the pitch fluctuates, the pulses at tm are not equally spaced. If we try to describe such a case in a generic short window [0 T] supposing that the pitch changes in a linear way, the excitation can be written as:

Here fo represents the mean frequency (pitch) at the center of the segment, and gamma is the normalised frequency-rate-change (pitch-change). By filtering such an excitation signal we get speech with changing pitch: 

2. Fourier analysis

If we analyse  such a segment by applying the Fourier transform, 

we get a specific, spectral representation with nice formant representation:

We can notice that even the formant structure is well described, the  representation of the harmonic structure of the speech is distorted, mainly at high frequencies. This is due to the properties of the harmonic components sweeping through a certain frequency region during the analysed segment [0 T] . This sweeping is depicted exatly on the next figure:

  

3. Chirp Analysis

Let's replace the base of the FT composed of sinusoids with a NEW BASE composed of chirpy harmonics:

Then the basis of the new transform could be written as

where alpha is the only extra parameter representing the estimated chirp-rate used for the analysis. Then we can define the Chirp transform as following:

By analysing the previous speech segment with the proposed Chirp transform (ChT) and comparing the result with the Fourier analysis we get:

 

As we see, the chirp-analysis of a speech segment delivers fine representation of the harmonic componenet. The resulting spectral envelop is a smoothed version of the vocal tract transfer function, where the smoothing follows the direction of the pitch-change (instead of being constant - as in case of FT, or, following the formant position changes).

Notes:
- to achive precise representation a good knowledge of the frequency variation rate is required. (so that a g)
- the FT is a particular case of ChT, where a =0
- the chirp transform is orthogonal, therefore suitable for resynthesis
- the chirp transform covers the whole time-frequency space

4. The Short-time Chirp Transform

Based on the previous notes, the Short-time Chirp transform could be written as

where w is the analysis window and M is the time step.The base is defined as:

5. Short-time Chirp Analysis

To analyse a continuous signal we need the value of the frequency variation rate alpha_m for each segment. This is the normalised derivative of the pitch trajectory belonging to the speaker we are analysing. For segment m with a mean frequency fo:

Note:
In case if there is more than one source present in the recording, the spectral representation of the tracked source will be enhanced, while the specrtal repr. of the other will be smeared, mainly in time-frequency regions, where the derivative of their pitch trajectory stronngly differs.  

Finally, here is an example of time-freq. analysis with the STChT:

You can find more figures under the Demo samples section.

Last changes: 04/17/2004