Loading...
Loading...

Go to the content (press return)

Wav2Pix: speech-conditioned face generation using generative adversarial networks

Author
Cardoso, A.; Roldan, F.; Tubau, M.; Escur, J.; Pascual, S.; Salvador, A.; Mohedano, E.; McGuinness, K.; Torres, J.; Giro, X.
Type of activity
Presentation of work at congresses
Name of edition
2019 IEEE International Conference on Acoustics, Speech and Signal Processing
Date of publication
2019
Presentation's date
2019-05-16
Book of congress proceedings
2019 IEEE International Conference on Acoustics, Speech, and Signal Processing: proceedings: May 12-17, 2019: Brighton Conference Centre, Brighton, United Kingdom
First page
8633
Last page
8637
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
DOI
10.1109/ICASSP.2019.8682970
Project funding
Deep learning technologies for speech and audio processing
Multimodal Signal Processing and Machine Learning on Graphs
Repository
http://hdl.handle.net/2117/167073 Open in new window
https://imatge.upc.edu/web/publications/wav2pix-speech-conditioned-face-generation-using-generative-adversarial-networks Open in new window
URL
https://ieeexplore.ieee.org/document/8682970 Open in new window
Abstract
Speech is a rich biometric signal that contains information about the identity, gender and emotional state of the speaker. In this work, we explore its potential to generate face images of a speaker by conditioning a Generative Adversarial Network (GAN) with raw speech input. We propose a deep neural network that is trained from scratch in an end-to-end fashion, generating a face directly from the raw speech waveform without any additional identity information (e.g reference image or one-hot enc...
Citation
Cardoso, A. [et al.]. Wav2Pix: speech-conditioned face generation using generative adversarial networks. A: IEEE International Conference on Acoustics, Speech, and Signal Processing. "2019 IEEE International Conference on Acoustics, Speech, and Signal Processing: proceedings: May 12-17, 2019: Brighton Conference Centre, Brighton, United Kingdom". Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 8633-8637.
Keywords
Adversarial learning, Computer vision., Deep learning, Face, Face synthesis, Feature extraction, Generative adversarial networks, Generators, Videos, Visualization
Group of research
CAP - High Performace Computing Group
GPI - Image and Video Processing Group
IDEAI-UPC - Intelligent Data Science and Artificial Intelligence Research Center
TALP - Centre for Language and Speech Technologies and Applications
VEU - Speech Processing Group

Participants