Loading...
Loading...

Go to the content (press return)

UPC multimodal speaker diarization system for the 2018 Albayzin challenge

Author
India, M.; Sagastiberri, I.; Palau, P.; Sayrol, E.; Morros, J.R.; Hernando, J.
Type of activity
Presentation of work at congresses
Name of edition
International Conference on Advances in Speech and Language Technologies for Iberian Languages 2018
Date of publication
2018
Presentation's date
2018-11-22
Book of congress proceedings
IberSPEECH 2018: program and proceedings: 21-23 November 2018: Barcelona, Spain
First page
199
Last page
203
Publisher
International Speech Communication Association (ISCA)
DOI
https://doi.org/10.21437/IberSPEECH.2018-40 Open in new window
Project funding
Multimodal Signal Processing and Machine Learning on Graphs
Tecnologías de aprendizaje profundo aplicadas al procesado de voz y audio
Repository
http://hdl.handle.net/2117/127821 Open in new window
URL
https://www.isca-speech.org/archive/IberSPEECH_2018/abstracts/IberS18_AE-2_India-Massana.html Open in new window
Abstract
This paper presents the UPC system proposed for the Multimodal Speaker Diarization task of the 2018 Albayzin Challenge. This approach works by processing individually the speech and the image signal. In the speech domain, speaker diarization is performed using identity embeddings created by a triplet loss DNN that uses i-vectors as input. The triplet DNN is trained with an additional regularization loss that minimizes the variance of both positive and negative distances. A sliding windows is the...
Citation
India, M. [et al.]. UPC multimodal speaker diarization system for the 2018 Albayzin challenge. A: International Conference on Advances in Speech and Language Technologies for Iberian Languages. "IberSPEECH 2018: program and proceedings: 21-23 November 2018: Barcelona, Spain". Baixas: International Speech Communication Association (ISCA), 2018, p. 199-203.
Keywords
Face diarization, Multimodal system, Speaker diarization
Group of research
GPI - Image and Video Processing Group
IDEAI-UPC - Intelligent Data Science and Artificial Intelligence Research Center
TALP - Centre for Language and Speech Technologies and Applications
VEU - Speech Processing Group

Participants

Attachments