Loading...
Loading...

Go to the content (press return)

Cross-modal embeddings for video and audio retrieval

Author
Suris, D.; Cardoso, A.; Salvador, A.; Torres, J.; Giro, X.
Type of activity
Presentation of work at congresses
Name of edition
Women in Computer Vision Workshop 2018
Date of publication
2018
Presentation's date
2018-09-09
Book of congress proceedings
Computer Vision, ECCV 2018 Workshops: Munich, Germany, September 8-14, 2018: proceedings, part IV
First page
711
Last page
716
Publisher
Springer
DOI
https://doi.org/10.1007/978-3-030-11018-5_62 Open in new window
Project funding
Multimodal Signal Processing and Machine Learning on Graphs
Severo Ochoa BSC Program Coordinator
Repository
http://hdl.handle.net/2117/129095 Open in new window
https://imatge.upc.edu/web/publications/cross-modal-embeddings-video-and-audio-retrieval Open in new window
URL
https://link.springer.com/chapter/10.1007/978-3-030-11018-5_62 Open in new window
Abstract
In this work, we explore the multi-modal information provided by the Youtube-8M dataset by projecting the audio and visual features into a common feature space, to obtain joint audio-visual embeddings. These links are used to retrieve audio samples that fit well to a given silent video, and also to retrieve images that match a given query audio. The results in terms of Recall@K obtained over a subset of YouTube-8M videos show the potential of this unsupervised approach for cross-modal feature le...
Citation
Surís, D. [et al.]. Cross-modal embeddings for video and audio retrieval. A: Women in Computer Vision Workshop. "Computer Vision, ECCV 2018 Workshops: Munich, Germany, September 8-14, 2018: proceedings, part IV". Berlín: Springer, 2019, p. 711-716.
Keywords
Cross-modal, Retrieval, YouTube-8M
Group of research
CAP - High Performace Computing Group
GPI - Image and Video Processing Group
IDEAI-UPC - Intelligent Data Science and Artificial Intelligence Research Center

Participants

Attachments