Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP
- Autores
- Scavuzzo, Carlos Matias; Scavuzzo, Juan Manuel; Campero, Micaela Natalia; Anegagrie, Melaku; Aramendia, Aranzazu Amor; Benito, Agustín; Periago, Maria Victoria
- Año de publicación
- 2022
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- In the field of landscape epidemiology, the contribution of machine learning (ML) to modeling of epidemiological risk scenarios presents itself as a good alternative. This study aims to break with the ”black box” paradigm that underlies the application of automatic learning techniques by using SHAP to determine the contribution of each variable in ML models applied to geospatial health, using the prevalence of hookworms, intestinal parasites, in Ethiopia, where they are widely distributed; the country bears the third-highest burden of hookworm in Sub-Saharan Africa. XGBoost software was used, a very popular ML model, to fit and analyze the data. The Python SHAP library was used to understand the importance in the trained model, of the variables for predictions. The description of the contribution of these variables on a particular prediction was obtained, using different types of plot methods. The results show that the ML models are superior to the classical statistical models; not only demonstrating similar results but also explaining, by using the SHAP package, the influence and interactions between the variables in the generated models. This analysis provides information to help understand the epidemiological problem presented and provides a tool for similar studies.
Fil: Scavuzzo, Carlos Matias. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina. Comision Nacional de Actividades Espaciales. Instituto de Altos Estudios Espaciales "Mario Gulich"; Argentina
Fil: Scavuzzo, Juan Manuel. Comision Nacional de Actividades Espaciales. Instituto de Altos Estudios Espaciales "Mario Gulich"; Argentina
Fil: Campero, Micaela Natalia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina. Comision Nacional de Actividades Espaciales. Instituto de Altos Estudios Espaciales "Mario Gulich"; Argentina
Fil: Anegagrie, Melaku. Fundación Mundo Sano; Argentina. Instituto de Salud Carlos III; España
Fil: Aramendia, Aranzazu Amor. Fundación Mundo Sano; Argentina. Instituto de Salud Carlos III; España
Fil: Benito, Agustín. Instituto de Salud Carlos III; España
Fil: Periago, Maria Victoria. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Fundación Mundo Sano; Argentina - Materia
-
ETHIOPIA
HOOKWORM
MACHINE LEARNING
REMOTE SENSING
SHAP
SHAPLEY - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/200813
Ver los metadatos del registro completo
id |
CONICETDig_05e581987b8210ea22249b694b6b8b9d |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/200813 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Feature importance: Opening a soil-transmitted helminth machine learning model via SHAPScavuzzo, Carlos MatiasScavuzzo, Juan ManuelCampero, Micaela NataliaAnegagrie, MelakuAramendia, Aranzazu AmorBenito, AgustínPeriago, Maria VictoriaETHIOPIAHOOKWORMMACHINE LEARNINGREMOTE SENSINGSHAPSHAPLEYhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1In the field of landscape epidemiology, the contribution of machine learning (ML) to modeling of epidemiological risk scenarios presents itself as a good alternative. This study aims to break with the ”black box” paradigm that underlies the application of automatic learning techniques by using SHAP to determine the contribution of each variable in ML models applied to geospatial health, using the prevalence of hookworms, intestinal parasites, in Ethiopia, where they are widely distributed; the country bears the third-highest burden of hookworm in Sub-Saharan Africa. XGBoost software was used, a very popular ML model, to fit and analyze the data. The Python SHAP library was used to understand the importance in the trained model, of the variables for predictions. The description of the contribution of these variables on a particular prediction was obtained, using different types of plot methods. The results show that the ML models are superior to the classical statistical models; not only demonstrating similar results but also explaining, by using the SHAP package, the influence and interactions between the variables in the generated models. This analysis provides information to help understand the epidemiological problem presented and provides a tool for similar studies.Fil: Scavuzzo, Carlos Matias. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina. Comision Nacional de Actividades Espaciales. Instituto de Altos Estudios Espaciales "Mario Gulich"; ArgentinaFil: Scavuzzo, Juan Manuel. Comision Nacional de Actividades Espaciales. Instituto de Altos Estudios Espaciales "Mario Gulich"; ArgentinaFil: Campero, Micaela Natalia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina. Comision Nacional de Actividades Espaciales. Instituto de Altos Estudios Espaciales "Mario Gulich"; ArgentinaFil: Anegagrie, Melaku. Fundación Mundo Sano; Argentina. Instituto de Salud Carlos III; EspañaFil: Aramendia, Aranzazu Amor. Fundación Mundo Sano; Argentina. Instituto de Salud Carlos III; EspañaFil: Benito, Agustín. Instituto de Salud Carlos III; EspañaFil: Periago, Maria Victoria. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Fundación Mundo Sano; ArgentinaKeAi Communications Co.2022-03info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/200813Scavuzzo, Carlos Matias; Scavuzzo, Juan Manuel; Campero, Micaela Natalia; Anegagrie, Melaku; Aramendia, Aranzazu Amor; et al.; Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP; KeAi Communications Co.; Infectious Disease Modelling; 7; 1; 3-2022; 262-2762468-0427CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://linkinghub.elsevier.com/retrieve/pii/S2468042722000045info:eu-repo/semantics/altIdentifier/doi/10.1016/j.idm.2022.01.004info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-nd/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-03T09:53:09Zoai:ri.conicet.gov.ar:11336/200813instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-03 09:53:09.872CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP |
title |
Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP |
spellingShingle |
Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP Scavuzzo, Carlos Matias ETHIOPIA HOOKWORM MACHINE LEARNING REMOTE SENSING SHAP SHAPLEY |
title_short |
Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP |
title_full |
Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP |
title_fullStr |
Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP |
title_full_unstemmed |
Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP |
title_sort |
Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP |
dc.creator.none.fl_str_mv |
Scavuzzo, Carlos Matias Scavuzzo, Juan Manuel Campero, Micaela Natalia Anegagrie, Melaku Aramendia, Aranzazu Amor Benito, Agustín Periago, Maria Victoria |
author |
Scavuzzo, Carlos Matias |
author_facet |
Scavuzzo, Carlos Matias Scavuzzo, Juan Manuel Campero, Micaela Natalia Anegagrie, Melaku Aramendia, Aranzazu Amor Benito, Agustín Periago, Maria Victoria |
author_role |
author |
author2 |
Scavuzzo, Juan Manuel Campero, Micaela Natalia Anegagrie, Melaku Aramendia, Aranzazu Amor Benito, Agustín Periago, Maria Victoria |
author2_role |
author author author author author author |
dc.subject.none.fl_str_mv |
ETHIOPIA HOOKWORM MACHINE LEARNING REMOTE SENSING SHAP SHAPLEY |
topic |
ETHIOPIA HOOKWORM MACHINE LEARNING REMOTE SENSING SHAP SHAPLEY |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
In the field of landscape epidemiology, the contribution of machine learning (ML) to modeling of epidemiological risk scenarios presents itself as a good alternative. This study aims to break with the ”black box” paradigm that underlies the application of automatic learning techniques by using SHAP to determine the contribution of each variable in ML models applied to geospatial health, using the prevalence of hookworms, intestinal parasites, in Ethiopia, where they are widely distributed; the country bears the third-highest burden of hookworm in Sub-Saharan Africa. XGBoost software was used, a very popular ML model, to fit and analyze the data. The Python SHAP library was used to understand the importance in the trained model, of the variables for predictions. The description of the contribution of these variables on a particular prediction was obtained, using different types of plot methods. The results show that the ML models are superior to the classical statistical models; not only demonstrating similar results but also explaining, by using the SHAP package, the influence and interactions between the variables in the generated models. This analysis provides information to help understand the epidemiological problem presented and provides a tool for similar studies. Fil: Scavuzzo, Carlos Matias. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina. Comision Nacional de Actividades Espaciales. Instituto de Altos Estudios Espaciales "Mario Gulich"; Argentina Fil: Scavuzzo, Juan Manuel. Comision Nacional de Actividades Espaciales. Instituto de Altos Estudios Espaciales "Mario Gulich"; Argentina Fil: Campero, Micaela Natalia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina. Comision Nacional de Actividades Espaciales. Instituto de Altos Estudios Espaciales "Mario Gulich"; Argentina Fil: Anegagrie, Melaku. Fundación Mundo Sano; Argentina. Instituto de Salud Carlos III; España Fil: Aramendia, Aranzazu Amor. Fundación Mundo Sano; Argentina. Instituto de Salud Carlos III; España Fil: Benito, Agustín. Instituto de Salud Carlos III; España Fil: Periago, Maria Victoria. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Fundación Mundo Sano; Argentina |
description |
In the field of landscape epidemiology, the contribution of machine learning (ML) to modeling of epidemiological risk scenarios presents itself as a good alternative. This study aims to break with the ”black box” paradigm that underlies the application of automatic learning techniques by using SHAP to determine the contribution of each variable in ML models applied to geospatial health, using the prevalence of hookworms, intestinal parasites, in Ethiopia, where they are widely distributed; the country bears the third-highest burden of hookworm in Sub-Saharan Africa. XGBoost software was used, a very popular ML model, to fit and analyze the data. The Python SHAP library was used to understand the importance in the trained model, of the variables for predictions. The description of the contribution of these variables on a particular prediction was obtained, using different types of plot methods. The results show that the ML models are superior to the classical statistical models; not only demonstrating similar results but also explaining, by using the SHAP package, the influence and interactions between the variables in the generated models. This analysis provides information to help understand the epidemiological problem presented and provides a tool for similar studies. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-03 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/200813 Scavuzzo, Carlos Matias; Scavuzzo, Juan Manuel; Campero, Micaela Natalia; Anegagrie, Melaku; Aramendia, Aranzazu Amor; et al.; Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP; KeAi Communications Co.; Infectious Disease Modelling; 7; 1; 3-2022; 262-276 2468-0427 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/200813 |
identifier_str_mv |
Scavuzzo, Carlos Matias; Scavuzzo, Juan Manuel; Campero, Micaela Natalia; Anegagrie, Melaku; Aramendia, Aranzazu Amor; et al.; Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP; KeAi Communications Co.; Infectious Disease Modelling; 7; 1; 3-2022; 262-276 2468-0427 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://linkinghub.elsevier.com/retrieve/pii/S2468042722000045 info:eu-repo/semantics/altIdentifier/doi/10.1016/j.idm.2022.01.004 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-nd/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
KeAi Communications Co. |
publisher.none.fl_str_mv |
KeAi Communications Co. |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1842269205083193344 |
score |
13.13397 |