Views

- Publish
- Submit
- Advanced
State: visible
Add New Item
- Business Card
- Demo Type
- Document
- Event
- Faq Folder
- File
- Folder
- Forum
- Image
- Link
- News Item
- Photo
- Photo Album
- Article
- ResearchPresentation Folder
- Topic
- Weblog
- event_folder

Speech Synthesis

Seminar on Speech Synthesis

Speech Synthesis

Speech synthesis is the generation of an acoustic speech signal by a machine (computer). Its application comprises text-to-speech (TTS) systems, like e-mail or news readers, etc., dialogue systems, as for example train schedule information or flight reservation, or automatic translation (speech-to-speech) systems.

The beginnings of speech synthesis can be dated back in the 18th century, when Wolfgang van Kempelen built a (mechanical) speaking machine for empress Maria Theresia. In the 20th century electronical speech synthesizers evolved, and nowadays digital signal processing allows for the implementation of advanced speech processing algorithms on PCs.

In this seminar, we will primarily be concerned with the signal generation part of speech synthesis, reviewing state-of-the-art algorithms. However, we may as well touch the interesting topics of text-to-phoneme conversion, prosody generation, synthesis of multi-lingual text or emotional speech, or voice conversion.

Course organization

The student should work on a selected topic and give an oral presentation in class during a 45 minute discussion session. Work in small groups of 2 or 3 students is strongly encouraged.

The first meeting will be held in seminar room of INW at TU Graz, Inffeldgasse 12, first floor, on Wednesday, Oct. 8, 2003 at 2:00 p.m. This will be used to assign groups and topics, and to coordinate the seminar schedule.

First meeting, and group and topic assignment

8. 10. 2003

14:00

Seminar Room INW

List of participants and assigned topics

Date

Participant's name

Presentation
22. 10. 2003	Erhard Rank	Speech synthesis: A brief motivation and overview
29. 10. 2003	Marco Piccolino	Text-to-speech: the Linguistic Perspective
5. 11. 2003	Hannes Pirker, Martin Hagmüller	Intonation modelling (Fujisaki and more), Physical modeling I: Vocal folds
12. 11. 2003	Helmuth Ploner-Bernard	Physical modeling II: Speech Synthesis by Articulatory Models (report)
19. 11. 2003	Markus Flohberger	Source-filter modeling (report), Audio Examples: Klatt_vowel, Klatt_bah, Multivox, Multipulse_Linear_Prediction, DECTALK, Female_Voice_Dennis_Klatt_86. Animated GIFs: One-mass model, Three-mass model.
26. 11. 2003	Thomas Wiener	Database-driven speech synthesis systems (ppt) (report)
3. 12. 2003	David Ludwig	Prosodic manipulation (ppt)
17. 12. 2003	Franz Zotter	Emotional speech (report)
14. 1. 2004	Robin Hofe	Speaking styles (ppt)

inks

Examples and Demos

	Bell Labs TTS Demo
	Natural Voices Demo
	SVOX Interactive Demo
	Speech Synthesis Examples, Stuttgart
	Expressive Synthesized Speech
	Emotional Synthesis Examples

Freely available synthesizers

	The Festival Speech Synthesis System
	The MBROLA Project Homepage

References/Course material

Physical modeling

	K. Ishizaka and J.L. Flanagan: Synthesis of Voiced Sounds from a Two-Mass Model of the Vocal Cords, Bell Systems Technical Journal, vol 51, pp 1233-1267, 1972.
	G. Bailly: Learning to speak. Sensori-motor control of speech movements, Speech Communication 22, iss. 2-3, pp 251-267, 1997.
	P. Badin, G. Bailly, M. Raybaudi, C. Segebarth: A Three-Dimensional Linear Articulatory Model Based on MRI Data, ESCA/COCOSDA Workshop on Speech Synthesis, Jenolan Caves, Australia, pages 249-254, 1998.

Source-filter modeling

	J. D. Markel and A. H. Gray, Jr.: Linear Prediction of Speech, Springer, 1976.
	D. H. Klatt: Software for a Cascade/Parallel Formant Synthesizer, J. Acoust. Soc. Am. 67, pp 971-995, 1980.
	G. Fant: The Voice Source in Connected Speech, Speech Communication, vol 22, iss 2-3, pp 125-139, 1997.

Unit selection

	A. Hunt and A. Black: Unit selection in a concatenative speech synthesis system using large speech database, in Proc. of ICASSP 1996, vol.1, pp.373-376, Atlanta, Georgia.
	A. W. Black and P. Taylor: Automatically clustering similar units for unit selection in speech synthesis, Proc. of Eurospeech 1997, Rhodes, Greece.
	B. Bozkurt, M. Bagein, and T. Dutoit: From MBROLA to NU-MBROLA, Multitel-TCTS Lab, Faculte Polytechnique de Mons, Belgium, 2001.

Prosodic manipulations

	E. Moulines and F. Charpentier: Pitch-Synchronous Waveform Processing Techniques for Text-to-Speech Synthesis using Diphones, Speech Communication, vol 9, pp 452-467, 1990.
	T. Dutoit and H. Leich: MBR-PSOLA: Text-To-Speech synthesis based on an MBE re-synthesis of the segments database, Speech Communication, vol 13, pp 435-440, 1993.
	J. Laroche, Y. Stylianou, and E. Moulines: HNS: Speech Modification Based on a Harmonic+Noise Model, Proc. of ICASSP 1993, vol.2, pp.550-553.
	J. Laroche, Y. Stylianou, and E. Moulines: HNS: A Simple, Efficient Harmonic + Noise Model for Speech, Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 1993, pp.169-172.
	Y. Stylianou: Applying the harmonic plus noise model in concatenative speech synthesis, IEEE Trans. on Speech and Audio Processing, vol. 9, no.1, pp.21-29, Jan. 2001.
	G. Bailly: A parametric harmonic + noise model, in Keller et al.: Improvements in Speech Synthesis, Wiley, 2002.
	E. R. Banga et al.: Concatenative text-to-speech synthesis based on sinusoidal modelling, in Keller et al.: Improvements in Speech Synthesis, Wiley, 2002.
	D. O'Brian and A. Monaghan: Shape invariant pitch and time-scale modification of speech based on a harmonic model, in Keller et al.: Improvements in Speech Synthesis, Wiley, 2002.

Emotions, speaking styles, voice conversion

	Proceedings of the ISCA workshop on Speech and Emotion, Belfast, 2000.
	Book chapters 19-28 (Part III): Issues in Styles of Speech, in Keller et al.: Improvements in Speech Synthesis, Wiley, 2002.
	D. G. Childers: Glottal Source Modeling for Voice Conversion, Speech Communication, vol 16, pp 127-138, 1995.
	I. Titze, D. Wong, B. Story and R. Long: Considerations in voice transformation with physiologic scaling principles, Speech Communication 22, iss 2-3, pp 113-123, 1997.
	A. Kain and M. Macon: Spectral Voice Conversion for Text-to-Speech Synthesis, Proc. ICASSP 1998, vol 1, pp 285-288.

Other sources

	IEEE-Xplore and
	CiteSeer

Created by marian
Last modified 2005-10-25 16:43

SPSC

Sections

Personal tools

Views

Speech Synthesis

Speech Synthesis

Course organization

Suggested topics for seminar presentations

First meeting, and group and topic assignment

List of participants and assigned topics

inks

Examples and Demos

Freely available synthesizers

References/Course material

Physical modeling

Source-filter modeling

Unit selection

Prosodic manipulations

Emotions, speaking styles, voice conversion

Other sources

Navigation