Skip to content.

SPSC

Sections
Personal tools
You are here: Home » Courses » Advanced Signal Processing 1 and 2 » Speech Synthesis
Views

Speech Synthesis

Advanced Signal Processing Seminar on the topic of Speech Synthesis, held in the summer term 2008.

Contents

This seminar will focus on the two dominant state-of-the-art corpus-based methods for text-to-speech synthesis, namely unit-selection based speech synthesis and the more recently developed Hidden Markov Model (HMM) based speech synthesis. Today's commercial systems mostly employ the unit-selection method.
In unit-selection synthesis a large speech corpus is recorded and segmented. During synthesis segments/units are concatenated that minimize the distance to each other (concatenation cost) and to the target units (target cost).
In HMM based speech synthesis HMMs are trained on a corpus of speech data. During synthesis a sequence of features (spectral, pitch, and duration features) is generated from the HMMs and used for synthesizing the signal.
The following list suggests topics for presentation. It is not exhaustive and you can also use different papers to present a topic.

The first meeting (Vorbesprechung) will be on Tuesday 11.3.2008, 16:00-18:00, SR-INW.

General topics

Automatic speech segmentation

  • A. Ljolje, M. D. Riley (1993), Automatic segmentation of speech for TTS. In Proceedings of EUROSPEECH 1993, pages 1445-1448, Berlin, Germany.
  • F. Malfrere, T. Dutoit (1997), High-quality speech synthesis for phonetic speech segmentation. In Proceedings of EUROSPEECH 1997, pages 2631-2634, Rhodes, Greece.
  • L. Wang, Y. Zhao, M. Chu, J. Zhou, Z. Cao (2004), Refining segmental boundaries for TTS database using fine contextual-dependent boundary models. In Proceedings of ICASSP 2004, pages 641-644, Montreal, Canada.
  • A. Park, J. R. Glass (2005), Towards Unsupervised Pattern Discovery in Speech. In Proceedings of ASRU 2005, pages 53-58, San Juan.

Conversational speech

Synthesis of singing

Unit-selction synthesis related topics

Basics and history of unit selection speech synthesis

Concatenation costs and target costs

HMM synthesis related topics

Basics of HMM-based speech synthesis

Speaker interpolation

  • T. Yoshimura, T. Masuko, K. Tokuda, T. Kobayashi, T. Kitamura (1997), Speaker interpolation in HMM-based speech synthesis system. In Proceedings of EUROSPEECH 1997, pages 2523-2526.
  • T. Yoshimura, T. Masuko, K. Tokuda, T. Kobayashi, T. Kitamura (2000), Speaker interpolation for HMM-based speech synthesis system. J. Acoust. Soc. Jpn., 21(4).
  • M. Tachibana, J. Yamagishi, T. Masuko, T. Kobayashi (2005), Speech synthesis with various emotional expressions and speaking styles by style Interpolation and morphing. IEICE Transactions on Information & Systems, E88-D(11), pages 2484-2491.

Speaker adaptation

  • M. Tamura, T. Masuko, K. Tokuda, T. Kobayashi (2001), Text-to-speech synthesis with arbitrary speaker's voice from average voice. In Proceedings of EUROSPEECH 2001, pages 345-348.
  • J. Yamagishi, M. Tamura, T. Masuko, K. Tokuda, T. Kobayashi (2003), A training method of average voice model for HMM-based speech synthesis. IEICE Transactions on Fundamentals, E86-A(8), pages 1956-1963.
  • Y. Nakano, M. Tachibana, J. Yamagishi, T. Koayashi (2006), Constrained structural maximum a posteriori linear regression for average-voice-based speech synthesis. In Proceedings of INTERSPEECH 2006.
  • J. Yamagishi, T. Kobayashi (2007), Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training. IEICE Transactions on Information & Systems, E90-D(2), pages 533-543.

Signal generation

  • S. Imai (1983), Cepstral analysis synthesis on the mel frequency scale. In Proceedings of ICASSP 1983, pages 93–96.
  • T. Fukada, K. Tokuda, T. Kobayashi and S. Imai (1992), An adaptive algorithm for melcepstral analysis of speech. In Proceedings of ICASSP 1992, pages 137–140.
  • Hideki Kawahara, Ikuyo Masuda-Katsuse and Alain de Cheveigné (1999), Restructuring speech representations using a pitch-adaptive time–frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds. Speech Communication, 27(3-4), pages 187-207.
  • H. Zen, T. Toda, M. Nakamura, K. Tokuda (2007), Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005. IEICE Transactions on Inormation & Systems, E90-D(1), pages 25-33.
  • R. Maia, T. Toda, H. Zen, Y. Nankaku, K. Tokuda (2007), An excitation model for HMM-based speech synthesis based on residual modeling. In Proceedings of SSW6 workshop.

Context clustering

  • S.J. Young, J.J. Odell, P.C. Woodland (1994), Tree-Based State Tying for High Accuracy Modelling . In Proceedings of ARPA Human Language Technology Workshop, pages 307-312, New Yersey, USA.
  • K. Shinoda, T. Watanabe (1997), Acoustic modeling based on the MDL principle for speech recognition. In Proceedings of EUROSPEECH 1997, pages 99-102.
  • J. Yamagishi, M. Tamura, T. Masuko, K. Tokuda, T. Kobayashi (2003), A context clustering technique for average voice models. IEICE Transactions on Information and Systems, E86-D(3), pages 534-542.


References

  • HMM-based speech synthesis system
  • The Festival Speech Synthesis System
  • Viennese Sociolect and Dialect Synthesis project

    Timetable

    Di 11.3.2008 16:00 - 18:00 Vorbesprechung M. Pucher Presentation
    Di 15.4.2008 16:00 - 19:00 Signal GenerationC. CarunchoPresentation Paper
    Di 29.4.2008 16:00 - 19:00 Basics of HMM-based speech synthesisP. Gampp, A. Sereinig Presentation1 Presentation2 Paper1 Paper2
    Speaker interpolation S. Rexeis, M. Stracka Presentation Paper
    Di 10.6.2008 16:00 - 19:00 Synthesis of singingR. Peharz, P. Meissner Presentation1 Presentation2 Paper
    Conversational speech J. Luig Presentation Paper
    Di 24.6.2008 16:00 - 19:00 VSDS/ftw. PresentationM. Pucher, F. Neubarth, C. Kranzler, M. Bruss, D. Schabus, G. Schuchmann


    Contact



    Michael Pucher

    Telecommunications Research Center Vienna (FTW) Tech Gate Vienna Donau-City-Strasse 1, 3rd floor A-1220 Vienna Austria

    Phone: +43 1 505 2830-46

    Fax: +43 1 505 2830-99

    E-mail: pucher at ftw.at

    Web: http://dialect-tts.ftw.at, http://userver.ftw.at/~pucher, http://www.ftw.at

  • Created by klaus
    Last modified 2008-07-07 10:47