Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence

Autores
Tewari, Shrankhala; Toledo Margalef, Pablo Adrian; Kareem, Ayesha; Abdul Hussein, Ayah; White, Marina; Wazana, Ashley; Davidge, Sandra T.; Delrieux, Claudio Augusto; Connor, Kristin L.
Año de publicación
2021
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
The Developmental Origins of Health and Disease (DOHaD) framework aims to understand how early life exposures shape lifecycle health. To date, no comprehensive list of these exposures and their interactions has been developed, which limits our ability to predict trajectories of risk and resiliency in humans. To address this gap, we developed a model that uses text-mining, machine learning, and natural language processing approaches to automate search, data extraction, and content analysis from DOHaD-related research articles available in PubMed. Our first model captured 2469 articles, which were subsequently categorised into topics based on word frequencies within the titles and abstracts. A manual screening validated 848 of these as relevant, which were used to develop a revised model that finally captured 2098 articles that largely fell under the most prominently researched domains related to our specific DOHaD focus. The articles were clustered according to latent topic extraction, and 23 experts in the field independently labelled the perceived topics. Consensus analysis on this labelling yielded mostly from fair to substantial agreement, which demonstrates that automated models can be developed to successfully retrieve and classify research literature, as a first step to gather evidence related to DOHaD risk and resilience factors that influence later life human health.
Fil: Tewari, Shrankhala. Carleton University; Canadá
Fil: Toledo Margalef, Pablo Adrian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Centro Nacional Patagónico; Argentina
Fil: Kareem, Ayesha. Carleton University; Canadá
Fil: Abdul Hussein, Ayah. Carleton University; Canadá
Fil: White, Marina. Carleton University; Canadá
Fil: Wazana, Ashley. McGill University; Canadá
Fil: Davidge, Sandra T.. University of Alberta; Canadá
Fil: Delrieux, Claudio Augusto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ingeniería Eléctrica y de Computadoras; Argentina
Fil: Connor, Kristin L.. Carleton University; Canadá
Materia
MACHINE LEARNING
TEXT MINING
DEVELOPMENTAL ORIGINS OF HEALTH AND DISEASE
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/159933

id CONICETDig_69edefeb1399d9060d1868fb56cd2556
oai_identifier_str oai:ri.conicet.gov.ar:11336/159933
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD EvidenceTewari, ShrankhalaToledo Margalef, Pablo AdrianKareem, AyeshaAbdul Hussein, AyahWhite, MarinaWazana, AshleyDavidge, Sandra T.Delrieux, Claudio AugustoConnor, Kristin L.MACHINE LEARNINGTEXT MININGDEVELOPMENTAL ORIGINS OF HEALTH AND DISEASEhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1The Developmental Origins of Health and Disease (DOHaD) framework aims to understand how early life exposures shape lifecycle health. To date, no comprehensive list of these exposures and their interactions has been developed, which limits our ability to predict trajectories of risk and resiliency in humans. To address this gap, we developed a model that uses text-mining, machine learning, and natural language processing approaches to automate search, data extraction, and content analysis from DOHaD-related research articles available in PubMed. Our first model captured 2469 articles, which were subsequently categorised into topics based on word frequencies within the titles and abstracts. A manual screening validated 848 of these as relevant, which were used to develop a revised model that finally captured 2098 articles that largely fell under the most prominently researched domains related to our specific DOHaD focus. The articles were clustered according to latent topic extraction, and 23 experts in the field independently labelled the perceived topics. Consensus analysis on this labelling yielded mostly from fair to substantial agreement, which demonstrates that automated models can be developed to successfully retrieve and classify research literature, as a first step to gather evidence related to DOHaD risk and resilience factors that influence later life human health.Fil: Tewari, Shrankhala. Carleton University; CanadáFil: Toledo Margalef, Pablo Adrian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Centro Nacional Patagónico; ArgentinaFil: Kareem, Ayesha. Carleton University; CanadáFil: Abdul Hussein, Ayah. Carleton University; CanadáFil: White, Marina. Carleton University; CanadáFil: Wazana, Ashley. McGill University; CanadáFil: Davidge, Sandra T.. University of Alberta; CanadáFil: Delrieux, Claudio Augusto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ingeniería Eléctrica y de Computadoras; ArgentinaFil: Connor, Kristin L.. Carleton University; CanadáMultidisciplinary Digital Publishing Institute2021-10-22info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/159933Tewari, Shrankhala; Toledo Margalef, Pablo Adrian; Kareem, Ayesha; Abdul Hussein, Ayah; White, Marina; et al.; Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence; Multidisciplinary Digital Publishing Institute; Journal of Personalized Medicine; 11; 11; 22-10-2021; 1-132075-4426CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.mdpi.com/2075-4426/11/11/1064info:eu-repo/semantics/altIdentifier/doi/10.3390/jpm11111064info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-10-22T12:17:48Zoai:ri.conicet.gov.ar:11336/159933instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-10-22 12:17:48.308CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence
title Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence
spellingShingle Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence
Tewari, Shrankhala
MACHINE LEARNING
TEXT MINING
DEVELOPMENTAL ORIGINS OF HEALTH AND DISEASE
title_short Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence
title_full Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence
title_fullStr Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence
title_full_unstemmed Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence
title_sort Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence
dc.creator.none.fl_str_mv Tewari, Shrankhala
Toledo Margalef, Pablo Adrian
Kareem, Ayesha
Abdul Hussein, Ayah
White, Marina
Wazana, Ashley
Davidge, Sandra T.
Delrieux, Claudio Augusto
Connor, Kristin L.
author Tewari, Shrankhala
author_facet Tewari, Shrankhala
Toledo Margalef, Pablo Adrian
Kareem, Ayesha
Abdul Hussein, Ayah
White, Marina
Wazana, Ashley
Davidge, Sandra T.
Delrieux, Claudio Augusto
Connor, Kristin L.
author_role author
author2 Toledo Margalef, Pablo Adrian
Kareem, Ayesha
Abdul Hussein, Ayah
White, Marina
Wazana, Ashley
Davidge, Sandra T.
Delrieux, Claudio Augusto
Connor, Kristin L.
author2_role author
author
author
author
author
author
author
author
dc.subject.none.fl_str_mv MACHINE LEARNING
TEXT MINING
DEVELOPMENTAL ORIGINS OF HEALTH AND DISEASE
topic MACHINE LEARNING
TEXT MINING
DEVELOPMENTAL ORIGINS OF HEALTH AND DISEASE
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv The Developmental Origins of Health and Disease (DOHaD) framework aims to understand how early life exposures shape lifecycle health. To date, no comprehensive list of these exposures and their interactions has been developed, which limits our ability to predict trajectories of risk and resiliency in humans. To address this gap, we developed a model that uses text-mining, machine learning, and natural language processing approaches to automate search, data extraction, and content analysis from DOHaD-related research articles available in PubMed. Our first model captured 2469 articles, which were subsequently categorised into topics based on word frequencies within the titles and abstracts. A manual screening validated 848 of these as relevant, which were used to develop a revised model that finally captured 2098 articles that largely fell under the most prominently researched domains related to our specific DOHaD focus. The articles were clustered according to latent topic extraction, and 23 experts in the field independently labelled the perceived topics. Consensus analysis on this labelling yielded mostly from fair to substantial agreement, which demonstrates that automated models can be developed to successfully retrieve and classify research literature, as a first step to gather evidence related to DOHaD risk and resilience factors that influence later life human health.
Fil: Tewari, Shrankhala. Carleton University; Canadá
Fil: Toledo Margalef, Pablo Adrian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Centro Nacional Patagónico; Argentina
Fil: Kareem, Ayesha. Carleton University; Canadá
Fil: Abdul Hussein, Ayah. Carleton University; Canadá
Fil: White, Marina. Carleton University; Canadá
Fil: Wazana, Ashley. McGill University; Canadá
Fil: Davidge, Sandra T.. University of Alberta; Canadá
Fil: Delrieux, Claudio Augusto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ingeniería Eléctrica y de Computadoras; Argentina
Fil: Connor, Kristin L.. Carleton University; Canadá
description The Developmental Origins of Health and Disease (DOHaD) framework aims to understand how early life exposures shape lifecycle health. To date, no comprehensive list of these exposures and their interactions has been developed, which limits our ability to predict trajectories of risk and resiliency in humans. To address this gap, we developed a model that uses text-mining, machine learning, and natural language processing approaches to automate search, data extraction, and content analysis from DOHaD-related research articles available in PubMed. Our first model captured 2469 articles, which were subsequently categorised into topics based on word frequencies within the titles and abstracts. A manual screening validated 848 of these as relevant, which were used to develop a revised model that finally captured 2098 articles that largely fell under the most prominently researched domains related to our specific DOHaD focus. The articles were clustered according to latent topic extraction, and 23 experts in the field independently labelled the perceived topics. Consensus analysis on this labelling yielded mostly from fair to substantial agreement, which demonstrates that automated models can be developed to successfully retrieve and classify research literature, as a first step to gather evidence related to DOHaD risk and resilience factors that influence later life human health.
publishDate 2021
dc.date.none.fl_str_mv 2021-10-22
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/159933
Tewari, Shrankhala; Toledo Margalef, Pablo Adrian; Kareem, Ayesha; Abdul Hussein, Ayah; White, Marina; et al.; Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence; Multidisciplinary Digital Publishing Institute; Journal of Personalized Medicine; 11; 11; 22-10-2021; 1-13
2075-4426
CONICET Digital
CONICET
url http://hdl.handle.net/11336/159933
identifier_str_mv Tewari, Shrankhala; Toledo Margalef, Pablo Adrian; Kareem, Ayesha; Abdul Hussein, Ayah; White, Marina; et al.; Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence; Multidisciplinary Digital Publishing Institute; Journal of Personalized Medicine; 11; 11; 22-10-2021; 1-13
2075-4426
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://www.mdpi.com/2075-4426/11/11/1064
info:eu-repo/semantics/altIdentifier/doi/10.3390/jpm11111064
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Multidisciplinary Digital Publishing Institute
publisher.none.fl_str_mv Multidisciplinary Digital Publishing Institute
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1846782604957712384
score 12.982451