Loading...
Loading...

Go to the content (press return)

LSTM neural network-based speaker segmentation using acoustic and language modelling

Author
India, M.; Fonollosa, José A. R.; Hernando, J.
Type of activity
Presentation of work at congresses
Name of edition
18th Annual Conference of the International Speech Communication Association
Date of publication
2017
Presentation's date
2017-08-23
Book of congress proceedings
INTERSPEECH 2017: 20-24 August 2017: Stockholm
First page
2834
Last page
2838
Publisher
International Speech Communication Association (ISCA)
DOI
https://doi.org/10.21437/Interspeech.2017 Open in new window
Repository
http://hdl.handle.net/2117/112988 Open in new window
URL
http://www.isca-speech.org/archive/Interspeech_2017/pdfs/0407.PDF Open in new window
Abstract
This paper presents a new speaker change detection system based on Long Short-Term Memory (LSTM) neural networks using acoustic data and linguistic content. Language modelling is combined with two different Joint Factor Analysis (JFA) acoustic approaches: i-vectors and speaker factors. Both of them are compared with a baseline algorithm that uses cosine distance to detect speaker turn changes. LSTM neural networks with both linguistic and acoustic features have been able to prod...
Citation
India, M., Fonollosa, José A. R., Hernando, J. LSTM neural network-based speaker segmentation using acoustic and language modelling. A: Annual Conference of the International Speech Communication Association. "INTERSPEECH 2017: 20-24 August 2017: Stockholm". Stockholm: International Speech Communication Association (ISCA), 2017, p. 2834-2838.
Keywords
I-vectors, LSTM neural networks, Neural language modelling, Speaker factors, Speaker segmentation
Group of research
IDEAI-UPC - Intelligent Data Science and Artificial Intelligence Research Center
TALP - Centre for Language and Speech Technologies and Applications
VEU - Speech Processing Group

Participants

Attachments