Loading...
Loading...

Go to the content (press return)

One perceptron to rule them all: language, vision, audio and speech

Author
Giro, X.
Type of activity
Presentation of work at congresses
Name of edition
ACM International Conference on Multimedia Retrieval 2020
Date of publication
2020
Presentation's date
2020-10-26
Book of congress proceedings
ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval
First page
7
Last page
8
Publisher
Association for Computing Machinery (ACM)
DOI
10.1145/3372278.3390740
Project funding
Deep learning for video analytics in sport events
MALEGRA, TEC2016-75976-R
Repository
http://hdl.handle.net/2117/192196 Open in new window
https://imatge.upc.edu/web/publications/one-perceptron-rule-them-all-language-vision-audio-and-speech-tutorial Open in new window
URL
https://dl.acm.org/doi/abs/10.1145/3372278.3390740 Open in new window
Abstract
Deep neural networks have boosted the convergence of multimedia data analytics in a unified framework shared by practitioners in natural language, vision and speech. Image captioning, lip reading or video sonorization are some of the first applications of a new and exciting field of research exploiting the generalization properties of deep neural representation. This tutorial will firstly review the basic neural architectures to encode and decode vision, text and audio, to later review the those...
Citation
Giro, X. One perceptron to rule them all: language, vision, audio and speech. A: ACM International Conference on Multimedia Retrieval. "ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval". New York: Association for Computing Machinery (ACM), 2020, p. 7-8.
Keywords
Cross-modal, Deep learning, Joint embeddings, Multimodal
Group of research
GPI - Image and Video Processing Group
IDEAI-UPC - Intelligent Data Science and Artificial Intelligence Research Center

Participants