Go to the content (press return)

Deep learning technologies for speech and audio processing

Total activity: 50
Type of activity
Competitive project
Funding entity
Funding entity code
246.598,00 €
Start date
End date
Acoustic Event Detection, Aprendizaje Profundo, Conversión de Texto a Voz, Deep Learning, Deep Neural Networks, Detección de Eventos Acústicos, Reconocimiento del Habla, Reconocimineto del Locutor, Redes Neuronales Profundas, Speaker Recognition, Speech Recognition, Speech Technology, Tecnologías del Habla, Text to Speech
Deep learning methods are machine learning methods using multiple processing layers or levels of abstraction. Deep learning algorithms
are usually further characterized by having a simple and versatile structure. Specifically, deep learning is usually based on feed forward or
recursive multilayer neural networks to learn a particular model. A successful application of deep learning technologies consists in
selecting a good architecture for the neural network as well as an effective training procedure to learn the parameters of the network.
In recent years, modeling using neural networks has emerged again very strongly thanks to the results on effective learning algorithms for
deep and recursive neural networks. Other important factors of this renaissance are the availability of higher computing power and large
databases. Large databases are necessary to train multilayer structures with a large number of parameters and computational resources
make this process possible in a reasonable time.
Although its widespread use started a few years ago, and despite the difficulty of analyzing the behavior of deep learning algorithms, the
impact of deep learning is already very important in areas as image, speech and text processing in both research and commercial
applications. In speech recognition, for example, we have now systems based on a simple generic deep learning architecture that
outperform traditional speech recognition systems based on a complex architecture with many speech-specific processing modules.
This project proposes the development of new deep learning methods for speech and audio processing, exploring new applications and
continuing the initial work of the research team and the international community.
The project includes a comprehensive work package dedicated to deep learning and four other work packages dedicated to speech and
speaker recognition, acoustic event detection, voice synthesis and voice translation. In the first work package we will derive new
architectures and learning algorithms, taking into account the computational cost and the scalability to large databases, while the next work
packages will explore their application in speech and audio processing
In the framework of this work we plan to continue with the dissemination of the results and the collaboration with other national and
international research groups. We will also expect to transfer the results to the multiple companies already interested in the theme of the
project and its results. Specifically, the work plan details the cooperation with the Hospital Sant Joan de Déu of Barcelona in the
monitorization of the neonatal intensive care unit. We will put also emphasis on the evaluation of the results.
Adm. Estat
Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016
Resoluton year
Funcding program
Programa Estatal de Fomento de la Investigación Científica y Técnica de Excelencia
Funding subprogram
Subprograma Estatal de Generación de Conocimiento
Funding call
Excelencia: Proyectos I+D
Grant institution
Gobierno De España. Ministerio De Economía Y Competitividad, Mineco


Scientific and technological production

1 to 50 of 50 results