Linear Predictive Coding (LPC)

Introducing Remarks

  • In Speech Coding the properties of speech have to be considered (unlike channel coding)
  • For telephone applications the speech signal has to be re-synthesized (unlike speech recognition)
  • Frames with a length of 10-30ms are considered
  • The signal is considered stationary within a frame
  • The LPC coefficients are calculated for each such frame
  • The synthesized signal can be re-used
  • Analysis-by-Synthesis sytem
  • The Analysis-by-Synthesis System

  • A short-term LP synthesis filter representing spectral information of the speech signal
  • A long-term LP synthesis filter representing the pitch structure (optional)
  • A perceptual weighting filter, shaping the error in such a way that the quantization noise is masked by high-energy formants
  • Mean Squared Error (MSE) which minimizes the error signal
  • An excitation source, which is selected according to the error signal
  • The excitation is either white noise (unvoiced) or a uniform sample train (voiced)
  • Assume that q=0 ( autoregressive or AR model), i.e. any zeros are ignored, since they only add linear phase
  • Speech s(n) is filtered by an invers or predictor filter of an all-pole H(z)

          (1)

          (2)

  • and the output e(n) is called error or risidual signal

          (3)

    Least-Squares Autocorrelation Method

  • The classical least-squares method minimizes the mean energy in the error signal over a frame of speech data
  • The speech signal is multiplied by a Hamming window (or similar window)

          (4)

  • The LPC coefficients describe a smoothed average of the signal
  • Let E be the error energy


          (5)

    where e(n) is the risidual corresponding to the windowed signal x(n)

  • The coefficients are found by partial differentiations

          (6)

    This yields p linear equations in p unknown filter coefficients

  • Finally we receive the minimum risidual energy or prediction energy for a p-pole model

          (7)

    Problems

    There are only two options for excitation which lowers speech quality