A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions

Autores: Ferrer, Luciana; McLaren, Mitchell
Año de publicación: 2018
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: Probabilistic linear discriminant analysis (PLDA) is the leading method for computing scores in speaker recognition systems. The method models the vectors representing each audio sample as a sum of three terms: one that depends on the speaker identity, one that models the within-speaker variability, and one that models any remaining variability. The last two terms are assumed to be independent across samples. We recently proposed anextension of the PLDAmethod, whichwetermedJoint PLDA (JPLDA), where the second term is considered dependent on the type of nuisance condition present in the data (e.g., the language or channel). The proposed method led to significant gains for multilanguage speaker recognition when taking language as the nuisance condition. In this paper, we present a generalization of this approach that allows for multiple nuisance terms. We show results using language and several nuisance conditions describing the acoustic characteristics of the sample and demonstrate that jointly including all these factors in the model leads to better results than including only language or acoustic condition factors. Overall, we obtain relative improvements in detection cost function between 5% and 47% for various systems and test conditions with respect to standard PLDA approaches.
Fil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina
Fil: McLaren, Mitchell. Sri International. Speech Technology and Research Lab; Estados Unidos
19th Annual Conference of the International Speech Communication Association: Speech research for emerging markets in multilingual societies
Hyderabad
India
International Speech Communication Association
Materia: SPEAKER RECOGNITION
PROBABILISTIC LINEAR DISCRIMINANT ANALYSIS
Nivel de accesibilidad: acceso abierto
Condiciones de uso: https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
Institución: Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador: oai:ri.conicet.gov.ar:11336/162954

Acceder

id	CONICETDig_185768dd264a992b2a05109c1e4033b3
oai_identifier_str	oai:ri.conicet.gov.ar:11336/162954
network_acronym_str	CONICETDig
repository_id_str	3498
network_name_str	CONICET Digital (CONICET)
spelling	A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditionsFerrer, LucianaMcLaren, MitchellSPEAKER RECOGNITIONPROBABILISTIC LINEAR DISCRIMINANT ANALYSIShttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Probabilistic linear discriminant analysis (PLDA) is the leading method for computing scores in speaker recognition systems. The method models the vectors representing each audio sample as a sum of three terms: one that depends on the speaker identity, one that models the within-speaker variability, and one that models any remaining variability. The last two terms are assumed to be independent across samples. We recently proposed anextension of the PLDAmethod, whichwetermedJoint PLDA (JPLDA), where the second term is considered dependent on the type of nuisance condition present in the data (e.g., the language or channel). The proposed method led to significant gains for multilanguage speaker recognition when taking language as the nuisance condition. In this paper, we present a generalization of this approach that allows for multiple nuisance terms. We show results using language and several nuisance conditions describing the acoustic characteristics of the sample and demonstrate that jointly including all these factors in the model leads to better results than including only language or acoustic condition factors. Overall, we obtain relative improvements in detection cost function between 5% and 47% for various systems and test conditions with respect to standard PLDA approaches.Fil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaFil: McLaren, Mitchell. Sri International. Speech Technology and Research Lab; Estados Unidos19th Annual Conference of the International Speech Communication Association: Speech research for emerging markets in multilingual societiesHyderabadIndiaInternational Speech Communication AssociationInternational Speech Communication Association2018info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObjectConferenciaBookhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/162954A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions; 19th Annual Conference of the International Speech Communication Association: Speech research for emerging markets in multilingual societies; Hyderabad; India; 2018; 82-86CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.isca-speech.org/archive/interspeech_2018/ferrer18_interspeech.htmlinfo:eu-repo/semantics/altIdentifier/doi/10.21437/Interspeech.2018-1280Internacionalinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2026-06-04T10:51:19Zoai:ri.conicet.gov.ar:11336/162954instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982026-06-04 10:51:19.491CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv	A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions
title	A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions
spellingShingle	A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions Ferrer, Luciana SPEAKER RECOGNITION PROBABILISTIC LINEAR DISCRIMINANT ANALYSIS
title_short	A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions
title_full	A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions
title_fullStr	A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions
title_full_unstemmed	A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions
title_sort	A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions
dc.creator.none.fl_str_mv	Ferrer, Luciana McLaren, Mitchell
author	Ferrer, Luciana
author_facet	Ferrer, Luciana McLaren, Mitchell
author_role	author
author2	McLaren, Mitchell
author2_role	author
dc.subject.none.fl_str_mv	SPEAKER RECOGNITION PROBABILISTIC LINEAR DISCRIMINANT ANALYSIS
topic	SPEAKER RECOGNITION PROBABILISTIC LINEAR DISCRIMINANT ANALYSIS
purl_subject.fl_str_mv	https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv	Probabilistic linear discriminant analysis (PLDA) is the leading method for computing scores in speaker recognition systems. The method models the vectors representing each audio sample as a sum of three terms: one that depends on the speaker identity, one that models the within-speaker variability, and one that models any remaining variability. The last two terms are assumed to be independent across samples. We recently proposed anextension of the PLDAmethod, whichwetermedJoint PLDA (JPLDA), where the second term is considered dependent on the type of nuisance condition present in the data (e.g., the language or channel). The proposed method led to significant gains for multilanguage speaker recognition when taking language as the nuisance condition. In this paper, we present a generalization of this approach that allows for multiple nuisance terms. We show results using language and several nuisance conditions describing the acoustic characteristics of the sample and demonstrate that jointly including all these factors in the model leads to better results than including only language or acoustic condition factors. Overall, we obtain relative improvements in detection cost function between 5% and 47% for various systems and test conditions with respect to standard PLDA approaches. Fil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina Fil: McLaren, Mitchell. Sri International. Speech Technology and Research Lab; Estados Unidos 19th Annual Conference of the International Speech Communication Association: Speech research for emerging markets in multilingual societies Hyderabad India International Speech Communication Association
description	Probabilistic linear discriminant analysis (PLDA) is the leading method for computing scores in speaker recognition systems. The method models the vectors representing each audio sample as a sum of three terms: one that depends on the speaker identity, one that models the within-speaker variability, and one that models any remaining variability. The last two terms are assumed to be independent across samples. We recently proposed anextension of the PLDAmethod, whichwetermedJoint PLDA (JPLDA), where the second term is considered dependent on the type of nuisance condition present in the data (e.g., the language or channel). The proposed method led to significant gains for multilanguage speaker recognition when taking language as the nuisance condition. In this paper, we present a generalization of this approach that allows for multiple nuisance terms. We show results using language and several nuisance conditions describing the acoustic characteristics of the sample and demonstrate that jointly including all these factors in the model leads to better results than including only language or acoustic condition factors. Overall, we obtain relative improvements in detection cost function between 5% and 47% for various systems and test conditions with respect to standard PLDA approaches.
publishDate	2018
dc.date.none.fl_str_mv	2018
dc.type.none.fl_str_mv	info:eu-repo/semantics/publishedVersion info:eu-repo/semantics/conferenceObject Conferencia Book http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
status_str	publishedVersion
format	conferenceObject
dc.identifier.none.fl_str_mv	http://hdl.handle.net/11336/162954 A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions; 19th Annual Conference of the International Speech Communication Association: Speech research for emerging markets in multilingual societies; Hyderabad; India; 2018; 82-86 CONICET Digital CONICET
url	http://hdl.handle.net/11336/162954
identifier_str_mv	A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions; 19th Annual Conference of the International Speech Communication Association: Speech research for emerging markets in multilingual societies; Hyderabad; India; 2018; 82-86 CONICET Digital CONICET
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/https://www.isca-speech.org/archive/interspeech_2018/ferrer18_interspeech.html info:eu-repo/semantics/altIdentifier/doi/10.21437/Interspeech.2018-1280
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv	openAccess
rights_invalid_str_mv	https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv	application/pdf application/pdf
dc.coverage.none.fl_str_mv	Internacional
dc.publisher.none.fl_str_mv	International Speech Communication Association
publisher.none.fl_str_mv	International Speech Communication Association
dc.source.none.fl_str_mv	reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str	CONICET Digital (CONICET)
collection	CONICET Digital (CONICET)
instname_str	Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv	CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv	dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_	1867098199739596800
score	12.832306

A generalization of PLDA for joint modeling of speaker identity and multiple nuisance conditions

Publicaciones similares