Carregant...
Carregant...

Vés al contingut (premeu Retorn)

UPC multimodal speaker diarization system for the 2018 Albayzin challenge

Autor
India, M.; Sagastiberri, I.; Palau, P.; Sayrol, E.; Morros, J.R.; Hernando, J.
Tipus d'activitat
Presentació treball a congrés
Nom de l'edició
International Conference on Advances in Speech and Language Technologies for Iberian Languages 2018
Any de l'edició
2018
Data de presentació
2018-11-22
Llibre d'actes
IberSPEECH 2018: program and proceedings: 21-23 November 2018: Barcelona, Spain
Pàgina inicial
199
Pàgina final
203
Editor
International Speech Communication Association (ISCA)
DOI
https://doi.org/10.21437/IberSPEECH.2018-40 Obrir en finestra nova
Projecte finançador
Procesado de señales multimodales y aprendizaje automático en grafos.
Tecnologías de aprendizaje profundo aplicadas al procesado de voz y audio
Repositori
http://hdl.handle.net/2117/127821 Obrir en finestra nova
URL
https://www.isca-speech.org/archive/IberSPEECH_2018/abstracts/IberS18_AE-2_India-Massana.html Obrir en finestra nova
Resum
This paper presents the UPC system proposed for the Multimodal Speaker Diarization task of the 2018 Albayzin Challenge. This approach works by processing individually the speech and the image signal. In the speech domain, speaker diarization is performed using identity embeddings created by a triplet loss DNN that uses i-vectors as input. The triplet DNN is trained with an additional regularization loss that minimizes the variance of both positive and negative distances. A sliding windows is the...
Citació
India, M. [et al.]. UPC multimodal speaker diarization system for the 2018 Albayzin challenge. A: International Conference on Advances in Speech and Language Technologies for Iberian Languages. "IberSPEECH 2018: program and proceedings: 21-23 November 2018: Barcelona, Spain". Baixas: International Speech Communication Association (ISCA), 2018, p. 199-203.
Paraules clau
Face diarization, Multimodal system, Speaker diarization
Grup de recerca
GPI - Grup de Processament d'Imatge i Vídeo
IDEAI-UPC - Intelligent Data Science and Artificial Intelligence Research Center
TALP - Centre de Tecnologies i Aplicacions del Llenguatge i la Parla
VEU - Grup de Tractament de la Parla

Participants

Arxius