Pascual, S.; Bonafonte, A. International Conference on Advances in Speech and Language Technologies for Iberian Languages p. 64-72 DOI: 10.1007/978-3-319-49169-1_7 Data de presentació: 2016-11 Presentació treball a congrés
Prosodic breaks prediction from text is a fundamental task to obtain naturalness in text to speech applications. In this work we build a data-driven break predictor out of linguistic features like the Part of Speech (POS) tags and forward-backward word distance to punctuation marks, and to do so we use a basic Recurrent Neural Network (RNN) model to exploit the sequence dependency in decisions. In the experiments we evaluate the performance of a logistic regression model and the recurrent one. The results show that the logistic regression outperforms the baseline (CART) by a 9.5% in the F-score, and the addition of the recurrent layer in the model further
improves the predictions of the baseline by an 11%.