A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions

Autores
Ferrer, Luciana; McLaren, Mitchell
Año de publicación
2018
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Probabilistic linear discriminant analysis (PLDA) is the leading method for computing scores in speaker recognition systems. The method models the vectors representing each audio sample as a sum of three terms: one that depends on the speaker identity, one that models the within-speaker variability, and one that models any remaining variability. The last two terms are assumed to be independent across samples. We recently proposed anextension of the PLDAmethod, whichwetermedJoint PLDA (JPLDA), where the second term is considered dependent on the type of nuisance condition present in the data (e.g., the language or channel). The proposed method led to significant gains for multilanguage speaker recognition when taking language as the nuisance condition. In this paper, we present a generalization of this approach that allows for multiple nuisance terms. We show results using language and several nuisance conditions describing the acoustic characteristics of the sample and demonstrate that jointly including all these factors in the model leads to better results than including only language or acoustic condition factors. Overall, we obtain relative improvements in detection cost function between 5% and 47% for various systems and test conditions with respect to standard PLDA approaches.
Fil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina
Fil: McLaren, Mitchell. Sri International. Speech Technology and Research Lab; Estados Unidos
19th Annual Conference of the International Speech Communication Association: Speech research for emerging markets in multilingual societies
Hyderabad
India
International Speech Communication Association
Materia
SPEAKER RECOGNITION
PROBABILISTIC LINEAR DISCRIMINANT ANALYSIS
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/162954

id CONICETDig_185768dd264a992b2a05109c1e4033b3
oai_identifier_str oai:ri.conicet.gov.ar:11336/162954
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditionsFerrer, LucianaMcLaren, MitchellSPEAKER RECOGNITIONPROBABILISTIC LINEAR DISCRIMINANT ANALYSIShttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Probabilistic linear discriminant analysis (PLDA) is the leading method for computing scores in speaker recognition systems. The method models the vectors representing each audio sample as a sum of three terms: one that depends on the speaker identity, one that models the within-speaker variability, and one that models any remaining variability. The last two terms are assumed to be independent across samples. We recently proposed anextension of the PLDAmethod, whichwetermedJoint PLDA (JPLDA), where the second term is considered dependent on the type of nuisance condition present in the data (e.g., the language or channel). The proposed method led to significant gains for multilanguage speaker recognition when taking language as the nuisance condition. In this paper, we present a generalization of this approach that allows for multiple nuisance terms. We show results using language and several nuisance conditions describing the acoustic characteristics of the sample and demonstrate that jointly including all these factors in the model leads to better results than including only language or acoustic condition factors. Overall, we obtain relative improvements in detection cost function between 5% and 47% for various systems and test conditions with respect to standard PLDA approaches.Fil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaFil: McLaren, Mitchell. Sri International. Speech Technology and Research Lab; Estados Unidos19th Annual Conference of the International Speech Communication Association: Speech research for emerging markets in multilingual societiesHyderabadIndiaInternational Speech Communication AssociationInternational Speech Communication Association2018info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObjectConferenciaBookhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/162954A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions; 19th Annual Conference of the International Speech Communication Association: Speech research for emerging markets in multilingual societies; Hyderabad; India; 2018; 82-86CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.isca-speech.org/archive/interspeech_2018/ferrer18_interspeech.htmlinfo:eu-repo/semantics/altIdentifier/doi/10.21437/Interspeech.2018-1280Internacionalinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:36:03Zoai:ri.conicet.gov.ar:11336/162954instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:36:04.255CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions
title A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions
spellingShingle A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions
Ferrer, Luciana
SPEAKER RECOGNITION
PROBABILISTIC LINEAR DISCRIMINANT ANALYSIS
title_short A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions
title_full A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions
title_fullStr A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions
title_full_unstemmed A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions
title_sort A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions
dc.creator.none.fl_str_mv Ferrer, Luciana
McLaren, Mitchell
author Ferrer, Luciana
author_facet Ferrer, Luciana
McLaren, Mitchell
author_role author
author2 McLaren, Mitchell
author2_role author
dc.subject.none.fl_str_mv SPEAKER RECOGNITION
PROBABILISTIC LINEAR DISCRIMINANT ANALYSIS
topic SPEAKER RECOGNITION
PROBABILISTIC LINEAR DISCRIMINANT ANALYSIS
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Probabilistic linear discriminant analysis (PLDA) is the leading method for computing scores in speaker recognition systems. The method models the vectors representing each audio sample as a sum of three terms: one that depends on the speaker identity, one that models the within-speaker variability, and one that models any remaining variability. The last two terms are assumed to be independent across samples. We recently proposed anextension of the PLDAmethod, whichwetermedJoint PLDA (JPLDA), where the second term is considered dependent on the type of nuisance condition present in the data (e.g., the language or channel). The proposed method led to significant gains for multilanguage speaker recognition when taking language as the nuisance condition. In this paper, we present a generalization of this approach that allows for multiple nuisance terms. We show results using language and several nuisance conditions describing the acoustic characteristics of the sample and demonstrate that jointly including all these factors in the model leads to better results than including only language or acoustic condition factors. Overall, we obtain relative improvements in detection cost function between 5% and 47% for various systems and test conditions with respect to standard PLDA approaches.
Fil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina
Fil: McLaren, Mitchell. Sri International. Speech Technology and Research Lab; Estados Unidos
19th Annual Conference of the International Speech Communication Association: Speech research for emerging markets in multilingual societies
Hyderabad
India
International Speech Communication Association
description Probabilistic linear discriminant analysis (PLDA) is the leading method for computing scores in speaker recognition systems. The method models the vectors representing each audio sample as a sum of three terms: one that depends on the speaker identity, one that models the within-speaker variability, and one that models any remaining variability. The last two terms are assumed to be independent across samples. We recently proposed anextension of the PLDAmethod, whichwetermedJoint PLDA (JPLDA), where the second term is considered dependent on the type of nuisance condition present in the data (e.g., the language or channel). The proposed method led to significant gains for multilanguage speaker recognition when taking language as the nuisance condition. In this paper, we present a generalization of this approach that allows for multiple nuisance terms. We show results using language and several nuisance conditions describing the acoustic characteristics of the sample and demonstrate that jointly including all these factors in the model leads to better results than including only language or acoustic condition factors. Overall, we obtain relative improvements in detection cost function between 5% and 47% for various systems and test conditions with respect to standard PLDA approaches.
publishDate 2018
dc.date.none.fl_str_mv 2018
dc.type.none.fl_str_mv info:eu-repo/semantics/publishedVersion
info:eu-repo/semantics/conferenceObject
Conferencia
Book
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
status_str publishedVersion
format conferenceObject
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/162954
A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions; 19th Annual Conference of the International Speech Communication Association: Speech research for emerging markets in multilingual societies; Hyderabad; India; 2018; 82-86
CONICET Digital
CONICET
url http://hdl.handle.net/11336/162954
identifier_str_mv A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions; 19th Annual Conference of the International Speech Communication Association: Speech research for emerging markets in multilingual societies; Hyderabad; India; 2018; 82-86
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://www.isca-speech.org/archive/interspeech_2018/ferrer18_interspeech.html
info:eu-repo/semantics/altIdentifier/doi/10.21437/Interspeech.2018-1280
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.coverage.none.fl_str_mv Internacional
dc.publisher.none.fl_str_mv International Speech Communication Association
publisher.none.fl_str_mv International Speech Communication Association
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844613128249147392
score 13.070432