Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP

Autores
Scavuzzo, Carlos Matias; Scavuzzo, Juan Manuel; Campero, Micaela Natalia; Anegagrie, Melaku; Aramendia, Aranzazu Amor; Benito, Agustín; Periago, Maria Victoria
Año de publicación
2022
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
In the field of landscape epidemiology, the contribution of machine learning (ML) to modeling of epidemiological risk scenarios presents itself as a good alternative. This study aims to break with the ”black box” paradigm that underlies the application of automatic learning techniques by using SHAP to determine the contribution of each variable in ML models applied to geospatial health, using the prevalence of hookworms, intestinal parasites, in Ethiopia, where they are widely distributed; the country bears the third-highest burden of hookworm in Sub-Saharan Africa. XGBoost software was used, a very popular ML model, to fit and analyze the data. The Python SHAP library was used to understand the importance in the trained model, of the variables for predictions. The description of the contribution of these variables on a particular prediction was obtained, using different types of plot methods. The results show that the ML models are superior to the classical statistical models; not only demonstrating similar results but also explaining, by using the SHAP package, the influence and interactions between the variables in the generated models. This analysis provides information to help understand the epidemiological problem presented and provides a tool for similar studies.
Fil: Scavuzzo, Carlos Matias. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina. Comision Nacional de Actividades Espaciales. Instituto de Altos Estudios Espaciales "Mario Gulich"; Argentina
Fil: Scavuzzo, Juan Manuel. Comision Nacional de Actividades Espaciales. Instituto de Altos Estudios Espaciales "Mario Gulich"; Argentina
Fil: Campero, Micaela Natalia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina. Comision Nacional de Actividades Espaciales. Instituto de Altos Estudios Espaciales "Mario Gulich"; Argentina
Fil: Anegagrie, Melaku. Fundación Mundo Sano; Argentina. Instituto de Salud Carlos III; España
Fil: Aramendia, Aranzazu Amor. Fundación Mundo Sano; Argentina. Instituto de Salud Carlos III; España
Fil: Benito, Agustín. Instituto de Salud Carlos III; España
Fil: Periago, Maria Victoria. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Fundación Mundo Sano; Argentina
Materia
ETHIOPIA
HOOKWORM
MACHINE LEARNING
REMOTE SENSING
SHAP
SHAPLEY
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/200813

id CONICETDig_05e581987b8210ea22249b694b6b8b9d
oai_identifier_str oai:ri.conicet.gov.ar:11336/200813
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Feature importance: Opening a soil-transmitted helminth machine learning model via SHAPScavuzzo, Carlos MatiasScavuzzo, Juan ManuelCampero, Micaela NataliaAnegagrie, MelakuAramendia, Aranzazu AmorBenito, AgustínPeriago, Maria VictoriaETHIOPIAHOOKWORMMACHINE LEARNINGREMOTE SENSINGSHAPSHAPLEYhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1In the field of landscape epidemiology, the contribution of machine learning (ML) to modeling of epidemiological risk scenarios presents itself as a good alternative. This study aims to break with the ”black box” paradigm that underlies the application of automatic learning techniques by using SHAP to determine the contribution of each variable in ML models applied to geospatial health, using the prevalence of hookworms, intestinal parasites, in Ethiopia, where they are widely distributed; the country bears the third-highest burden of hookworm in Sub-Saharan Africa. XGBoost software was used, a very popular ML model, to fit and analyze the data. The Python SHAP library was used to understand the importance in the trained model, of the variables for predictions. The description of the contribution of these variables on a particular prediction was obtained, using different types of plot methods. The results show that the ML models are superior to the classical statistical models; not only demonstrating similar results but also explaining, by using the SHAP package, the influence and interactions between the variables in the generated models. This analysis provides information to help understand the epidemiological problem presented and provides a tool for similar studies.Fil: Scavuzzo, Carlos Matias. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina. Comision Nacional de Actividades Espaciales. Instituto de Altos Estudios Espaciales "Mario Gulich"; ArgentinaFil: Scavuzzo, Juan Manuel. Comision Nacional de Actividades Espaciales. Instituto de Altos Estudios Espaciales "Mario Gulich"; ArgentinaFil: Campero, Micaela Natalia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina. Comision Nacional de Actividades Espaciales. Instituto de Altos Estudios Espaciales "Mario Gulich"; ArgentinaFil: Anegagrie, Melaku. Fundación Mundo Sano; Argentina. Instituto de Salud Carlos III; EspañaFil: Aramendia, Aranzazu Amor. Fundación Mundo Sano; Argentina. Instituto de Salud Carlos III; EspañaFil: Benito, Agustín. Instituto de Salud Carlos III; EspañaFil: Periago, Maria Victoria. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Fundación Mundo Sano; ArgentinaKeAi Communications Co.2022-03info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/200813Scavuzzo, Carlos Matias; Scavuzzo, Juan Manuel; Campero, Micaela Natalia; Anegagrie, Melaku; Aramendia, Aranzazu Amor; et al.; Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP; KeAi Communications Co.; Infectious Disease Modelling; 7; 1; 3-2022; 262-2762468-0427CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://linkinghub.elsevier.com/retrieve/pii/S2468042722000045info:eu-repo/semantics/altIdentifier/doi/10.1016/j.idm.2022.01.004info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-nd/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-03T09:53:09Zoai:ri.conicet.gov.ar:11336/200813instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-03 09:53:09.872CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP
title Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP
spellingShingle Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP
Scavuzzo, Carlos Matias
ETHIOPIA
HOOKWORM
MACHINE LEARNING
REMOTE SENSING
SHAP
SHAPLEY
title_short Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP
title_full Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP
title_fullStr Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP
title_full_unstemmed Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP
title_sort Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP
dc.creator.none.fl_str_mv Scavuzzo, Carlos Matias
Scavuzzo, Juan Manuel
Campero, Micaela Natalia
Anegagrie, Melaku
Aramendia, Aranzazu Amor
Benito, Agustín
Periago, Maria Victoria
author Scavuzzo, Carlos Matias
author_facet Scavuzzo, Carlos Matias
Scavuzzo, Juan Manuel
Campero, Micaela Natalia
Anegagrie, Melaku
Aramendia, Aranzazu Amor
Benito, Agustín
Periago, Maria Victoria
author_role author
author2 Scavuzzo, Juan Manuel
Campero, Micaela Natalia
Anegagrie, Melaku
Aramendia, Aranzazu Amor
Benito, Agustín
Periago, Maria Victoria
author2_role author
author
author
author
author
author
dc.subject.none.fl_str_mv ETHIOPIA
HOOKWORM
MACHINE LEARNING
REMOTE SENSING
SHAP
SHAPLEY
topic ETHIOPIA
HOOKWORM
MACHINE LEARNING
REMOTE SENSING
SHAP
SHAPLEY
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv In the field of landscape epidemiology, the contribution of machine learning (ML) to modeling of epidemiological risk scenarios presents itself as a good alternative. This study aims to break with the ”black box” paradigm that underlies the application of automatic learning techniques by using SHAP to determine the contribution of each variable in ML models applied to geospatial health, using the prevalence of hookworms, intestinal parasites, in Ethiopia, where they are widely distributed; the country bears the third-highest burden of hookworm in Sub-Saharan Africa. XGBoost software was used, a very popular ML model, to fit and analyze the data. The Python SHAP library was used to understand the importance in the trained model, of the variables for predictions. The description of the contribution of these variables on a particular prediction was obtained, using different types of plot methods. The results show that the ML models are superior to the classical statistical models; not only demonstrating similar results but also explaining, by using the SHAP package, the influence and interactions between the variables in the generated models. This analysis provides information to help understand the epidemiological problem presented and provides a tool for similar studies.
Fil: Scavuzzo, Carlos Matias. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina. Comision Nacional de Actividades Espaciales. Instituto de Altos Estudios Espaciales "Mario Gulich"; Argentina
Fil: Scavuzzo, Juan Manuel. Comision Nacional de Actividades Espaciales. Instituto de Altos Estudios Espaciales "Mario Gulich"; Argentina
Fil: Campero, Micaela Natalia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina. Comision Nacional de Actividades Espaciales. Instituto de Altos Estudios Espaciales "Mario Gulich"; Argentina
Fil: Anegagrie, Melaku. Fundación Mundo Sano; Argentina. Instituto de Salud Carlos III; España
Fil: Aramendia, Aranzazu Amor. Fundación Mundo Sano; Argentina. Instituto de Salud Carlos III; España
Fil: Benito, Agustín. Instituto de Salud Carlos III; España
Fil: Periago, Maria Victoria. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Fundación Mundo Sano; Argentina
description In the field of landscape epidemiology, the contribution of machine learning (ML) to modeling of epidemiological risk scenarios presents itself as a good alternative. This study aims to break with the ”black box” paradigm that underlies the application of automatic learning techniques by using SHAP to determine the contribution of each variable in ML models applied to geospatial health, using the prevalence of hookworms, intestinal parasites, in Ethiopia, where they are widely distributed; the country bears the third-highest burden of hookworm in Sub-Saharan Africa. XGBoost software was used, a very popular ML model, to fit and analyze the data. The Python SHAP library was used to understand the importance in the trained model, of the variables for predictions. The description of the contribution of these variables on a particular prediction was obtained, using different types of plot methods. The results show that the ML models are superior to the classical statistical models; not only demonstrating similar results but also explaining, by using the SHAP package, the influence and interactions between the variables in the generated models. This analysis provides information to help understand the epidemiological problem presented and provides a tool for similar studies.
publishDate 2022
dc.date.none.fl_str_mv 2022-03
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/200813
Scavuzzo, Carlos Matias; Scavuzzo, Juan Manuel; Campero, Micaela Natalia; Anegagrie, Melaku; Aramendia, Aranzazu Amor; et al.; Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP; KeAi Communications Co.; Infectious Disease Modelling; 7; 1; 3-2022; 262-276
2468-0427
CONICET Digital
CONICET
url http://hdl.handle.net/11336/200813
identifier_str_mv Scavuzzo, Carlos Matias; Scavuzzo, Juan Manuel; Campero, Micaela Natalia; Anegagrie, Melaku; Aramendia, Aranzazu Amor; et al.; Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP; KeAi Communications Co.; Infectious Disease Modelling; 7; 1; 3-2022; 262-276
2468-0427
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://linkinghub.elsevier.com/retrieve/pii/S2468042722000045
info:eu-repo/semantics/altIdentifier/doi/10.1016/j.idm.2022.01.004
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv KeAi Communications Co.
publisher.none.fl_str_mv KeAi Communications Co.
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1842269205083193344
score 13.13397