A VQ based speaker recognition system based in histogram distances. Text independent and for noisy environements
Autor
Monte, E.; Arqué, R.; Miro, X.
Tipus d'activitat
Presentació treball a congrés
Nom de l'edició
5th International Conference on Spoken Language Processing
Any de l'edició
1998
Data de presentació
1998-11
Llibre d'actes
ICSLP 98: the 5th International Conference on Spoken Language Processing; incorporating the 7th Australian International Speech Science and Technology Conference; Sydney Convention Centre, Sydney, Australia, 30th November-4th December 1998
In speaker recognition systems based on VQ, normally each
speaker is assigned a codebook, and the classification is done by
means of the a distortion distance of the utterance computed by
means of each codebook. In [1] we proposed a system which
instead of having a codebook for each speaker, had only one
codebook for all the speakers, and for each speaker one
histogram. This histogram was the occupancy rate of each
codeword for a given speaker. This means that the information
of the histogram of...
In speaker recognition systems based on VQ, normally each
speaker is assigned a codebook, and the classification is done by
means of the a distortion distance of the utterance computed by
means of each codebook. In [1] we proposed a system which
instead of having a codebook for each speaker, had only one
codebook for all the speakers, and for each speaker one
histogram. This histogram was the occupancy rate of each
codeword for a given speaker. This means that the information
of the histogram of a given speaker is the probability that the
speaker utters the information related to the codeword. So we
approximated the pdf of each speaker by the normalized
histogram.
In this paper we present an exhaustive study of different
measures for comparing histograms: Kullbach-Leiber, logdifference
of each probability, geometrical distance, and the
Euclidean distance.
We have done also an exhaustive study of the properties of the
system for each distance in the presence of noise (white and
colored), and for different parameterizations:
LPC, MFCC, LPC-Cepstrum-OSA (One sided autocorrelation
sequence), LCP-Cepstrum. (Cepstrum with/without liftering).
As the combination of experiments was high, the conclusions
were drawn after an analisis of variance (ANOVA), and T-tests.
Thus the conclusions, with significance levels, can be drawn
about the differences and interactions between kind of. distance,
parametrizacion, kind of noise and level of noise.
In speaker recognition systems based on VQ, normally each
speaker is assigned a codebook, and the classification is done by means of the a distortion distance of the utterance computed by means of each codebook. In [1] we proposed a system which instead of having a codebook for each speaker, had only one codebook for all the speakers, and for each speaker one histogram. This histogram was the occupancy rate of each codeword for a given speaker. This means that the information of the histogram of a given speaker is the probability that the speaker utters the information related to the codeword. So we approximated the pdf of each speaker by the normalized histogram. In this paper we present an exhaustive study of different
measures for comparing histograms: Kullbach-Leiber, logdifference of each probability, geometrical distance, and the Euclidean distance. We have done also an exhaustive study of the properties of the system for each distance in the presence of noise (white and colored), and for different parameterizations: LPC, MFCC, LPC-Cepstrum-OSA (One sided autocorrelation sequence), LCP-Cepstrum. (Cepstrum with/without liftering). As the combination of experiments was high, the conclusions
were drawn after an analisis of variance (ANOVA), and T-tests. Thus the conclusions, with significance levels, can be drawn about the differences and interactions between kind of. distance, parametrizacion, kind of noise and level of noise.
Citació
Monte, E., Arqué, R., Miro, X. A VQ based speaker recognition system based in histogram distances. Text independent and for noisy environements. A: International Conference on Spoken Language Processing. "ICSLP'98 Proceedings". Sidney: Robert H. Mannel and Jordi Robert-Ribes, 1998, p. 185-188.