Loading...
Loading...

Go to the content (press return)

Exploring efficient neural architectures for linguistic-acoustic mapping in text-to-speech

Author
Pascual, S.; Serra, J.; Bonafonte, A.
Type of activity
Journal article
Journal
Applied sciences
Date of publication
2019-08-17
Volume
9
Number
16
First page
1
Last page
14
DOI
10.3390/app9163391
Project funding
Deep learning technologies for speech and audio processing
Repository
http://hdl.handle.net/2117/180077 Open in new window
URL
https://www.mdpi.com/2076-3417/9/16/3391 Open in new window
Abstract
Conversion from text to speech relies on the accurate mapping from linguistic to acoustic symbol sequences, for which current practice employs recurrent statistical models such as recurrent neural networks. Despite the good performance of such models (in terms of low distortion in the generated speech), their recursive structure with intermediate affine transformations tends to make them slow to train and to sample from. In this work, we explore two different mechanisms that enhance the operatio...
Citation
Pascual, S.; Serra, J.; Bonafonte, A. Exploring efficient neural architectures for linguistic-acoustic mapping in text-to-speech. "Applied sciences", 17 Agost 2019, vol. 9, núm. 16, p. 1-14.
Keywords
Acoustic model, Deep learning, Quasi-recurrent neural networks, Recurrent neural networks, Self-attention, Speech synthesis, Text-to-speech
Group of research
IDEAI-UPC - Intelligent Data Science and Artificial Intelligence Research Center
TALP - Centre for Language and Speech Technologies and Applications
VEU - Speech Processing Group

Participants

Attachments