Alzheimer disease recognition using speech-based embeddings from pre-trained models

Autores
Gauder, María Lara; Pepino, Leonardo Daniel; Ferrer, Luciana; Riera, Pablo
Año de publicación
2021
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
This paper describes our submission to the ADreSSo Challenge, which focuses on the problem of automatic recognition of Alzheimer's Disease (AD) from speech. The audio samples contain speech from the subjects describing a picture with the guidance of an experimenter. Our approach to the problem is based on the use of embeddings extracted from different pretrained models - trill, allosaurus, and wav2vec 2.0 - which were trained to solve different speech tasks. These features are modeled with a neural network that takes short segments of speech as input, generating an AD score per segment. The final score for an audio file is given by the average over all segments in the file. We include ablation results to show the performance of different feature types individually and in combination, a study of the effect of the segment size, and an analysis of statistical significance. Our results on the test data for the challenge reach an accuracy of 78.9%, outperforming both the acoustic and linguistic baselines provided by the organizers.
Fil: Gauder, María Lara. Universidad de Buenos Aires; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina
Fil: Pepino, Leonardo Daniel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina
Fil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina. Universidad de Buenos Aires; Argentina
Fil: Riera, Pablo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina
Materia
ADRESSO CHALLENGE
ALZHEIMER'S DISEASE RECOGNITION
COMPUTATIONAL PARALINGUISTICS
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/228670

id CONICETDig_94447b15fefbc0fcbf95a81353265ad6
oai_identifier_str oai:ri.conicet.gov.ar:11336/228670
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Alzheimer disease recognition using speech-based embeddings from pre-trained modelsGauder, María LaraPepino, Leonardo DanielFerrer, LucianaRiera, PabloADRESSO CHALLENGEALZHEIMER'S DISEASE RECOGNITIONCOMPUTATIONAL PARALINGUISTICShttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1This paper describes our submission to the ADreSSo Challenge, which focuses on the problem of automatic recognition of Alzheimer's Disease (AD) from speech. The audio samples contain speech from the subjects describing a picture with the guidance of an experimenter. Our approach to the problem is based on the use of embeddings extracted from different pretrained models - trill, allosaurus, and wav2vec 2.0 - which were trained to solve different speech tasks. These features are modeled with a neural network that takes short segments of speech as input, generating an AD score per segment. The final score for an audio file is given by the average over all segments in the file. We include ablation results to show the performance of different feature types individually and in combination, a study of the effect of the segment size, and an analysis of statistical significance. Our results on the test data for the challenge reach an accuracy of 78.9%, outperforming both the acoustic and linguistic baselines provided by the organizers.Fil: Gauder, María Lara. Universidad de Buenos Aires; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaFil: Pepino, Leonardo Daniel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaFil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina. Universidad de Buenos Aires; ArgentinaFil: Riera, Pablo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaInternational Speech Communication Association2021-09info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/228670Gauder, María Lara; Pepino, Leonardo Daniel; Ferrer, Luciana; Riera, Pablo; Alzheimer disease recognition using speech-based embeddings from pre-trained models; International Speech Communication Association; Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; 6; 9-2021; 4186-41902308-457X1990-9772CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.21437/Interspeech.2021-753info:eu-repo/semantics/altIdentifier/url/https://www.isca-archive.org/interspeech_2021/gauder21_interspeech.htmlinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2026-02-26T10:20:35Zoai:ri.conicet.gov.ar:11336/228670instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982026-02-26 10:20:35.967CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Alzheimer disease recognition using speech-based embeddings from pre-trained models
title Alzheimer disease recognition using speech-based embeddings from pre-trained models
spellingShingle Alzheimer disease recognition using speech-based embeddings from pre-trained models
Gauder, María Lara
ADRESSO CHALLENGE
ALZHEIMER'S DISEASE RECOGNITION
COMPUTATIONAL PARALINGUISTICS
title_short Alzheimer disease recognition using speech-based embeddings from pre-trained models
title_full Alzheimer disease recognition using speech-based embeddings from pre-trained models
title_fullStr Alzheimer disease recognition using speech-based embeddings from pre-trained models
title_full_unstemmed Alzheimer disease recognition using speech-based embeddings from pre-trained models
title_sort Alzheimer disease recognition using speech-based embeddings from pre-trained models
dc.creator.none.fl_str_mv Gauder, María Lara
Pepino, Leonardo Daniel
Ferrer, Luciana
Riera, Pablo
author Gauder, María Lara
author_facet Gauder, María Lara
Pepino, Leonardo Daniel
Ferrer, Luciana
Riera, Pablo
author_role author
author2 Pepino, Leonardo Daniel
Ferrer, Luciana
Riera, Pablo
author2_role author
author
author
dc.subject.none.fl_str_mv ADRESSO CHALLENGE
ALZHEIMER'S DISEASE RECOGNITION
COMPUTATIONAL PARALINGUISTICS
topic ADRESSO CHALLENGE
ALZHEIMER'S DISEASE RECOGNITION
COMPUTATIONAL PARALINGUISTICS
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv This paper describes our submission to the ADreSSo Challenge, which focuses on the problem of automatic recognition of Alzheimer's Disease (AD) from speech. The audio samples contain speech from the subjects describing a picture with the guidance of an experimenter. Our approach to the problem is based on the use of embeddings extracted from different pretrained models - trill, allosaurus, and wav2vec 2.0 - which were trained to solve different speech tasks. These features are modeled with a neural network that takes short segments of speech as input, generating an AD score per segment. The final score for an audio file is given by the average over all segments in the file. We include ablation results to show the performance of different feature types individually and in combination, a study of the effect of the segment size, and an analysis of statistical significance. Our results on the test data for the challenge reach an accuracy of 78.9%, outperforming both the acoustic and linguistic baselines provided by the organizers.
Fil: Gauder, María Lara. Universidad de Buenos Aires; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina
Fil: Pepino, Leonardo Daniel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina
Fil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina. Universidad de Buenos Aires; Argentina
Fil: Riera, Pablo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina
description This paper describes our submission to the ADreSSo Challenge, which focuses on the problem of automatic recognition of Alzheimer's Disease (AD) from speech. The audio samples contain speech from the subjects describing a picture with the guidance of an experimenter. Our approach to the problem is based on the use of embeddings extracted from different pretrained models - trill, allosaurus, and wav2vec 2.0 - which were trained to solve different speech tasks. These features are modeled with a neural network that takes short segments of speech as input, generating an AD score per segment. The final score for an audio file is given by the average over all segments in the file. We include ablation results to show the performance of different feature types individually and in combination, a study of the effect of the segment size, and an analysis of statistical significance. Our results on the test data for the challenge reach an accuracy of 78.9%, outperforming both the acoustic and linguistic baselines provided by the organizers.
publishDate 2021
dc.date.none.fl_str_mv 2021-09
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/228670
Gauder, María Lara; Pepino, Leonardo Daniel; Ferrer, Luciana; Riera, Pablo; Alzheimer disease recognition using speech-based embeddings from pre-trained models; International Speech Communication Association; Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; 6; 9-2021; 4186-4190
2308-457X
1990-9772
CONICET Digital
CONICET
url http://hdl.handle.net/11336/228670
identifier_str_mv Gauder, María Lara; Pepino, Leonardo Daniel; Ferrer, Luciana; Riera, Pablo; Alzheimer disease recognition using speech-based embeddings from pre-trained models; International Speech Communication Association; Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; 6; 9-2021; 4186-4190
2308-457X
1990-9772
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/10.21437/Interspeech.2021-753
info:eu-repo/semantics/altIdentifier/url/https://www.isca-archive.org/interspeech_2021/gauder21_interspeech.html
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv International Speech Communication Association
publisher.none.fl_str_mv International Speech Communication Association
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1858305561008799744
score 13.176822