Alzheimer disease recognition using speech-based embeddings from pre-trained models
- Autores
- Gauder, María Lara; Pepino, Leonardo Daniel; Ferrer, Luciana; Riera, Pablo
- Año de publicación
- 2021
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- This paper describes our submission to the ADreSSo Challenge, which focuses on the problem of automatic recognition of Alzheimer's Disease (AD) from speech. The audio samples contain speech from the subjects describing a picture with the guidance of an experimenter. Our approach to the problem is based on the use of embeddings extracted from different pretrained models - trill, allosaurus, and wav2vec 2.0 - which were trained to solve different speech tasks. These features are modeled with a neural network that takes short segments of speech as input, generating an AD score per segment. The final score for an audio file is given by the average over all segments in the file. We include ablation results to show the performance of different feature types individually and in combination, a study of the effect of the segment size, and an analysis of statistical significance. Our results on the test data for the challenge reach an accuracy of 78.9%, outperforming both the acoustic and linguistic baselines provided by the organizers.
Fil: Gauder, María Lara. Universidad de Buenos Aires; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina
Fil: Pepino, Leonardo Daniel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina
Fil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina. Universidad de Buenos Aires; Argentina
Fil: Riera, Pablo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina - Materia
-
ADRESSO CHALLENGE
ALZHEIMER'S DISEASE RECOGNITION
COMPUTATIONAL PARALINGUISTICS - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
.jpg)
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/228670
Ver los metadatos del registro completo
| id |
CONICETDig_94447b15fefbc0fcbf95a81353265ad6 |
|---|---|
| oai_identifier_str |
oai:ri.conicet.gov.ar:11336/228670 |
| network_acronym_str |
CONICETDig |
| repository_id_str |
3498 |
| network_name_str |
CONICET Digital (CONICET) |
| spelling |
Alzheimer disease recognition using speech-based embeddings from pre-trained modelsGauder, María LaraPepino, Leonardo DanielFerrer, LucianaRiera, PabloADRESSO CHALLENGEALZHEIMER'S DISEASE RECOGNITIONCOMPUTATIONAL PARALINGUISTICShttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1This paper describes our submission to the ADreSSo Challenge, which focuses on the problem of automatic recognition of Alzheimer's Disease (AD) from speech. The audio samples contain speech from the subjects describing a picture with the guidance of an experimenter. Our approach to the problem is based on the use of embeddings extracted from different pretrained models - trill, allosaurus, and wav2vec 2.0 - which were trained to solve different speech tasks. These features are modeled with a neural network that takes short segments of speech as input, generating an AD score per segment. The final score for an audio file is given by the average over all segments in the file. We include ablation results to show the performance of different feature types individually and in combination, a study of the effect of the segment size, and an analysis of statistical significance. Our results on the test data for the challenge reach an accuracy of 78.9%, outperforming both the acoustic and linguistic baselines provided by the organizers.Fil: Gauder, María Lara. Universidad de Buenos Aires; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaFil: Pepino, Leonardo Daniel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaFil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina. Universidad de Buenos Aires; ArgentinaFil: Riera, Pablo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaInternational Speech Communication Association2021-09info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/228670Gauder, María Lara; Pepino, Leonardo Daniel; Ferrer, Luciana; Riera, Pablo; Alzheimer disease recognition using speech-based embeddings from pre-trained models; International Speech Communication Association; Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; 6; 9-2021; 4186-41902308-457X1990-9772CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.21437/Interspeech.2021-753info:eu-repo/semantics/altIdentifier/url/https://www.isca-archive.org/interspeech_2021/gauder21_interspeech.htmlinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2026-02-26T10:20:35Zoai:ri.conicet.gov.ar:11336/228670instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982026-02-26 10:20:35.967CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
| dc.title.none.fl_str_mv |
Alzheimer disease recognition using speech-based embeddings from pre-trained models |
| title |
Alzheimer disease recognition using speech-based embeddings from pre-trained models |
| spellingShingle |
Alzheimer disease recognition using speech-based embeddings from pre-trained models Gauder, María Lara ADRESSO CHALLENGE ALZHEIMER'S DISEASE RECOGNITION COMPUTATIONAL PARALINGUISTICS |
| title_short |
Alzheimer disease recognition using speech-based embeddings from pre-trained models |
| title_full |
Alzheimer disease recognition using speech-based embeddings from pre-trained models |
| title_fullStr |
Alzheimer disease recognition using speech-based embeddings from pre-trained models |
| title_full_unstemmed |
Alzheimer disease recognition using speech-based embeddings from pre-trained models |
| title_sort |
Alzheimer disease recognition using speech-based embeddings from pre-trained models |
| dc.creator.none.fl_str_mv |
Gauder, María Lara Pepino, Leonardo Daniel Ferrer, Luciana Riera, Pablo |
| author |
Gauder, María Lara |
| author_facet |
Gauder, María Lara Pepino, Leonardo Daniel Ferrer, Luciana Riera, Pablo |
| author_role |
author |
| author2 |
Pepino, Leonardo Daniel Ferrer, Luciana Riera, Pablo |
| author2_role |
author author author |
| dc.subject.none.fl_str_mv |
ADRESSO CHALLENGE ALZHEIMER'S DISEASE RECOGNITION COMPUTATIONAL PARALINGUISTICS |
| topic |
ADRESSO CHALLENGE ALZHEIMER'S DISEASE RECOGNITION COMPUTATIONAL PARALINGUISTICS |
| purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
| dc.description.none.fl_txt_mv |
This paper describes our submission to the ADreSSo Challenge, which focuses on the problem of automatic recognition of Alzheimer's Disease (AD) from speech. The audio samples contain speech from the subjects describing a picture with the guidance of an experimenter. Our approach to the problem is based on the use of embeddings extracted from different pretrained models - trill, allosaurus, and wav2vec 2.0 - which were trained to solve different speech tasks. These features are modeled with a neural network that takes short segments of speech as input, generating an AD score per segment. The final score for an audio file is given by the average over all segments in the file. We include ablation results to show the performance of different feature types individually and in combination, a study of the effect of the segment size, and an analysis of statistical significance. Our results on the test data for the challenge reach an accuracy of 78.9%, outperforming both the acoustic and linguistic baselines provided by the organizers. Fil: Gauder, María Lara. Universidad de Buenos Aires; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina Fil: Pepino, Leonardo Daniel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina Fil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina. Universidad de Buenos Aires; Argentina Fil: Riera, Pablo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina |
| description |
This paper describes our submission to the ADreSSo Challenge, which focuses on the problem of automatic recognition of Alzheimer's Disease (AD) from speech. The audio samples contain speech from the subjects describing a picture with the guidance of an experimenter. Our approach to the problem is based on the use of embeddings extracted from different pretrained models - trill, allosaurus, and wav2vec 2.0 - which were trained to solve different speech tasks. These features are modeled with a neural network that takes short segments of speech as input, generating an AD score per segment. The final score for an audio file is given by the average over all segments in the file. We include ablation results to show the performance of different feature types individually and in combination, a study of the effect of the segment size, and an analysis of statistical significance. Our results on the test data for the challenge reach an accuracy of 78.9%, outperforming both the acoustic and linguistic baselines provided by the organizers. |
| publishDate |
2021 |
| dc.date.none.fl_str_mv |
2021-09 |
| dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
| format |
article |
| status_str |
publishedVersion |
| dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/228670 Gauder, María Lara; Pepino, Leonardo Daniel; Ferrer, Luciana; Riera, Pablo; Alzheimer disease recognition using speech-based embeddings from pre-trained models; International Speech Communication Association; Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; 6; 9-2021; 4186-4190 2308-457X 1990-9772 CONICET Digital CONICET |
| url |
http://hdl.handle.net/11336/228670 |
| identifier_str_mv |
Gauder, María Lara; Pepino, Leonardo Daniel; Ferrer, Luciana; Riera, Pablo; Alzheimer disease recognition using speech-based embeddings from pre-trained models; International Speech Communication Association; Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; 6; 9-2021; 4186-4190 2308-457X 1990-9772 CONICET Digital CONICET |
| dc.language.none.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/doi/10.21437/Interspeech.2021-753 info:eu-repo/semantics/altIdentifier/url/https://www.isca-archive.org/interspeech_2021/gauder21_interspeech.html |
| dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
| eu_rights_str_mv |
openAccess |
| rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
| dc.format.none.fl_str_mv |
application/pdf application/pdf application/pdf |
| dc.publisher.none.fl_str_mv |
International Speech Communication Association |
| publisher.none.fl_str_mv |
International Speech Communication Association |
| dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
| reponame_str |
CONICET Digital (CONICET) |
| collection |
CONICET Digital (CONICET) |
| instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
| repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
| repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
| _version_ |
1858305561008799744 |
| score |
13.176822 |