Alzheimer disease recognition using speech-based embeddings from pre-trained models

Autores: Gauder, María Lara; Pepino, Leonardo Daniel; Ferrer, Luciana; Riera, Pablo
Año de publicación: 2021
Idioma: inglés
Tipo de recurso: artículo
Estado: versión publicada
Descripción: This paper describes our submission to the ADreSSo Challenge, which focuses on the problem of automatic recognition of Alzheimer's Disease (AD) from speech. The audio samples contain speech from the subjects describing a picture with the guidance of an experimenter. Our approach to the problem is based on the use of embeddings extracted from different pretrained models - trill, allosaurus, and wav2vec 2.0 - which were trained to solve different speech tasks. These features are modeled with a neural network that takes short segments of speech as input, generating an AD score per segment. The final score for an audio file is given by the average over all segments in the file. We include ablation results to show the performance of different feature types individually and in combination, a study of the effect of the segment size, and an analysis of statistical significance. Our results on the test data for the challenge reach an accuracy of 78.9%, outperforming both the acoustic and linguistic baselines provided by the organizers.
Fil: Gauder, María Lara. Universidad de Buenos Aires; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina
Fil: Pepino, Leonardo Daniel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina
Fil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina. Universidad de Buenos Aires; Argentina
Fil: Riera, Pablo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina
Materia: ADRESSO CHALLENGE
ALZHEIMER'S DISEASE RECOGNITION
COMPUTATIONAL PARALINGUISTICS
Nivel de accesibilidad: acceso abierto
Condiciones de uso: https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
Institución: Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador: oai:ri.conicet.gov.ar:11336/228670

Acceder

id	CONICETDig_94447b15fefbc0fcbf95a81353265ad6
oai_identifier_str	oai:ri.conicet.gov.ar:11336/228670
network_acronym_str	CONICETDig
repository_id_str	3498
network_name_str	CONICET Digital (CONICET)
spelling	Alzheimer disease recognition using speech-based embeddings from pre-trained modelsGauder, María LaraPepino, Leonardo DanielFerrer, LucianaRiera, PabloADRESSO CHALLENGEALZHEIMER'S DISEASE RECOGNITIONCOMPUTATIONAL PARALINGUISTICShttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1This paper describes our submission to the ADreSSo Challenge, which focuses on the problem of automatic recognition of Alzheimer's Disease (AD) from speech. The audio samples contain speech from the subjects describing a picture with the guidance of an experimenter. Our approach to the problem is based on the use of embeddings extracted from different pretrained models - trill, allosaurus, and wav2vec 2.0 - which were trained to solve different speech tasks. These features are modeled with a neural network that takes short segments of speech as input, generating an AD score per segment. The final score for an audio file is given by the average over all segments in the file. We include ablation results to show the performance of different feature types individually and in combination, a study of the effect of the segment size, and an analysis of statistical significance. Our results on the test data for the challenge reach an accuracy of 78.9%, outperforming both the acoustic and linguistic baselines provided by the organizers.Fil: Gauder, María Lara. Universidad de Buenos Aires; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaFil: Pepino, Leonardo Daniel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaFil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina. Universidad de Buenos Aires; ArgentinaFil: Riera, Pablo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaInternational Speech Communication Association2021-09info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/228670Gauder, María Lara; Pepino, Leonardo Daniel; Ferrer, Luciana; Riera, Pablo; Alzheimer disease recognition using speech-based embeddings from pre-trained models; International Speech Communication Association; Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; 6; 9-2021; 4186-41902308-457X1990-9772CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.21437/Interspeech.2021-753info:eu-repo/semantics/altIdentifier/url/https://www.isca-archive.org/interspeech_2021/gauder21_interspeech.htmlinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2026-04-08T11:35:27Zoai:ri.conicet.gov.ar:11336/228670instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982026-04-08 11:35:28.32CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv	Alzheimer disease recognition using speech-based embeddings from pre-trained models
title	Alzheimer disease recognition using speech-based embeddings from pre-trained models
spellingShingle	Alzheimer disease recognition using speech-based embeddings from pre-trained models Gauder, María Lara ADRESSO CHALLENGE ALZHEIMER'S DISEASE RECOGNITION COMPUTATIONAL PARALINGUISTICS
title_short	Alzheimer disease recognition using speech-based embeddings from pre-trained models
title_full	Alzheimer disease recognition using speech-based embeddings from pre-trained models
title_fullStr	Alzheimer disease recognition using speech-based embeddings from pre-trained models
title_full_unstemmed	Alzheimer disease recognition using speech-based embeddings from pre-trained models
title_sort	Alzheimer disease recognition using speech-based embeddings from pre-trained models
dc.creator.none.fl_str_mv	Gauder, María Lara Pepino, Leonardo Daniel Ferrer, Luciana Riera, Pablo
author	Gauder, María Lara
author_facet	Gauder, María Lara Pepino, Leonardo Daniel Ferrer, Luciana Riera, Pablo
author_role	author
author2	Pepino, Leonardo Daniel Ferrer, Luciana Riera, Pablo
author2_role	author author author
dc.subject.none.fl_str_mv	ADRESSO CHALLENGE ALZHEIMER'S DISEASE RECOGNITION COMPUTATIONAL PARALINGUISTICS
topic	ADRESSO CHALLENGE ALZHEIMER'S DISEASE RECOGNITION COMPUTATIONAL PARALINGUISTICS
purl_subject.fl_str_mv	https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv	This paper describes our submission to the ADreSSo Challenge, which focuses on the problem of automatic recognition of Alzheimer's Disease (AD) from speech. The audio samples contain speech from the subjects describing a picture with the guidance of an experimenter. Our approach to the problem is based on the use of embeddings extracted from different pretrained models - trill, allosaurus, and wav2vec 2.0 - which were trained to solve different speech tasks. These features are modeled with a neural network that takes short segments of speech as input, generating an AD score per segment. The final score for an audio file is given by the average over all segments in the file. We include ablation results to show the performance of different feature types individually and in combination, a study of the effect of the segment size, and an analysis of statistical significance. Our results on the test data for the challenge reach an accuracy of 78.9%, outperforming both the acoustic and linguistic baselines provided by the organizers. Fil: Gauder, María Lara. Universidad de Buenos Aires; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina Fil: Pepino, Leonardo Daniel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina Fil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina. Universidad de Buenos Aires; Argentina Fil: Riera, Pablo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina
description	This paper describes our submission to the ADreSSo Challenge, which focuses on the problem of automatic recognition of Alzheimer's Disease (AD) from speech. The audio samples contain speech from the subjects describing a picture with the guidance of an experimenter. Our approach to the problem is based on the use of embeddings extracted from different pretrained models - trill, allosaurus, and wav2vec 2.0 - which were trained to solve different speech tasks. These features are modeled with a neural network that takes short segments of speech as input, generating an AD score per segment. The final score for an audio file is given by the average over all segments in the file. We include ablation results to show the performance of different feature types individually and in combination, a study of the effect of the segment size, and an analysis of statistical significance. Our results on the test data for the challenge reach an accuracy of 78.9%, outperforming both the acoustic and linguistic baselines provided by the organizers.
publishDate	2021
dc.date.none.fl_str_mv	2021-09
dc.type.none.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo
format	article
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://hdl.handle.net/11336/228670 Gauder, María Lara; Pepino, Leonardo Daniel; Ferrer, Luciana; Riera, Pablo; Alzheimer disease recognition using speech-based embeddings from pre-trained models; International Speech Communication Association; Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; 6; 9-2021; 4186-4190 2308-457X 1990-9772 CONICET Digital CONICET
url	http://hdl.handle.net/11336/228670
identifier_str_mv	Gauder, María Lara; Pepino, Leonardo Daniel; Ferrer, Luciana; Riera, Pablo; Alzheimer disease recognition using speech-based embeddings from pre-trained models; International Speech Communication Association; Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; 6; 9-2021; 4186-4190 2308-457X 1990-9772 CONICET Digital CONICET
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/doi/10.21437/Interspeech.2021-753 info:eu-repo/semantics/altIdentifier/url/https://www.isca-archive.org/interspeech_2021/gauder21_interspeech.html
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv	openAccess
rights_invalid_str_mv	https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv	application/pdf application/pdf application/pdf
dc.publisher.none.fl_str_mv	International Speech Communication Association
publisher.none.fl_str_mv	International Speech Communication Association
dc.source.none.fl_str_mv	reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str	CONICET Digital (CONICET)
collection	CONICET Digital (CONICET)
instname_str	Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv	CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv	dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_	1861931123796344832
score	13.09665

Alzheimer disease recognition using speech-based embeddings from pre-trained models

Publicaciones similares