Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence
- Autores
- Tewari, Shrankhala; Toledo Margalef, Pablo Adrian; Kareem, Ayesha; Abdul Hussein, Ayah; White, Marina; Wazana, Ashley; Davidge, Sandra T.; Delrieux, Claudio Augusto; Connor, Kristin L.
- Año de publicación
- 2021
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- The Developmental Origins of Health and Disease (DOHaD) framework aims to understand how early life exposures shape lifecycle health. To date, no comprehensive list of these exposures and their interactions has been developed, which limits our ability to predict trajectories of risk and resiliency in humans. To address this gap, we developed a model that uses text-mining, machine learning, and natural language processing approaches to automate search, data extraction, and content analysis from DOHaD-related research articles available in PubMed. Our first model captured 2469 articles, which were subsequently categorised into topics based on word frequencies within the titles and abstracts. A manual screening validated 848 of these as relevant, which were used to develop a revised model that finally captured 2098 articles that largely fell under the most prominently researched domains related to our specific DOHaD focus. The articles were clustered according to latent topic extraction, and 23 experts in the field independently labelled the perceived topics. Consensus analysis on this labelling yielded mostly from fair to substantial agreement, which demonstrates that automated models can be developed to successfully retrieve and classify research literature, as a first step to gather evidence related to DOHaD risk and resilience factors that influence later life human health.
Fil: Tewari, Shrankhala. Carleton University; Canadá
Fil: Toledo Margalef, Pablo Adrian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Centro Nacional Patagónico; Argentina
Fil: Kareem, Ayesha. Carleton University; Canadá
Fil: Abdul Hussein, Ayah. Carleton University; Canadá
Fil: White, Marina. Carleton University; Canadá
Fil: Wazana, Ashley. McGill University; Canadá
Fil: Davidge, Sandra T.. University of Alberta; Canadá
Fil: Delrieux, Claudio Augusto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ingeniería Eléctrica y de Computadoras; Argentina
Fil: Connor, Kristin L.. Carleton University; Canadá - Materia
-
MACHINE LEARNING
TEXT MINING
DEVELOPMENTAL ORIGINS OF HEALTH AND DISEASE - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
.jpg)
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/159933
Ver los metadatos del registro completo
| id |
CONICETDig_69edefeb1399d9060d1868fb56cd2556 |
|---|---|
| oai_identifier_str |
oai:ri.conicet.gov.ar:11336/159933 |
| network_acronym_str |
CONICETDig |
| repository_id_str |
3498 |
| network_name_str |
CONICET Digital (CONICET) |
| spelling |
Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD EvidenceTewari, ShrankhalaToledo Margalef, Pablo AdrianKareem, AyeshaAbdul Hussein, AyahWhite, MarinaWazana, AshleyDavidge, Sandra T.Delrieux, Claudio AugustoConnor, Kristin L.MACHINE LEARNINGTEXT MININGDEVELOPMENTAL ORIGINS OF HEALTH AND DISEASEhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1The Developmental Origins of Health and Disease (DOHaD) framework aims to understand how early life exposures shape lifecycle health. To date, no comprehensive list of these exposures and their interactions has been developed, which limits our ability to predict trajectories of risk and resiliency in humans. To address this gap, we developed a model that uses text-mining, machine learning, and natural language processing approaches to automate search, data extraction, and content analysis from DOHaD-related research articles available in PubMed. Our first model captured 2469 articles, which were subsequently categorised into topics based on word frequencies within the titles and abstracts. A manual screening validated 848 of these as relevant, which were used to develop a revised model that finally captured 2098 articles that largely fell under the most prominently researched domains related to our specific DOHaD focus. The articles were clustered according to latent topic extraction, and 23 experts in the field independently labelled the perceived topics. Consensus analysis on this labelling yielded mostly from fair to substantial agreement, which demonstrates that automated models can be developed to successfully retrieve and classify research literature, as a first step to gather evidence related to DOHaD risk and resilience factors that influence later life human health.Fil: Tewari, Shrankhala. Carleton University; CanadáFil: Toledo Margalef, Pablo Adrian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Centro Nacional Patagónico; ArgentinaFil: Kareem, Ayesha. Carleton University; CanadáFil: Abdul Hussein, Ayah. Carleton University; CanadáFil: White, Marina. Carleton University; CanadáFil: Wazana, Ashley. McGill University; CanadáFil: Davidge, Sandra T.. University of Alberta; CanadáFil: Delrieux, Claudio Augusto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ingeniería Eléctrica y de Computadoras; ArgentinaFil: Connor, Kristin L.. Carleton University; CanadáMultidisciplinary Digital Publishing Institute2021-10-22info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/159933Tewari, Shrankhala; Toledo Margalef, Pablo Adrian; Kareem, Ayesha; Abdul Hussein, Ayah; White, Marina; et al.; Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence; Multidisciplinary Digital Publishing Institute; Journal of Personalized Medicine; 11; 11; 22-10-2021; 1-132075-4426CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.mdpi.com/2075-4426/11/11/1064info:eu-repo/semantics/altIdentifier/doi/10.3390/jpm11111064info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-10-22T12:17:48Zoai:ri.conicet.gov.ar:11336/159933instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-10-22 12:17:48.308CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
| dc.title.none.fl_str_mv |
Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence |
| title |
Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence |
| spellingShingle |
Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence Tewari, Shrankhala MACHINE LEARNING TEXT MINING DEVELOPMENTAL ORIGINS OF HEALTH AND DISEASE |
| title_short |
Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence |
| title_full |
Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence |
| title_fullStr |
Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence |
| title_full_unstemmed |
Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence |
| title_sort |
Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence |
| dc.creator.none.fl_str_mv |
Tewari, Shrankhala Toledo Margalef, Pablo Adrian Kareem, Ayesha Abdul Hussein, Ayah White, Marina Wazana, Ashley Davidge, Sandra T. Delrieux, Claudio Augusto Connor, Kristin L. |
| author |
Tewari, Shrankhala |
| author_facet |
Tewari, Shrankhala Toledo Margalef, Pablo Adrian Kareem, Ayesha Abdul Hussein, Ayah White, Marina Wazana, Ashley Davidge, Sandra T. Delrieux, Claudio Augusto Connor, Kristin L. |
| author_role |
author |
| author2 |
Toledo Margalef, Pablo Adrian Kareem, Ayesha Abdul Hussein, Ayah White, Marina Wazana, Ashley Davidge, Sandra T. Delrieux, Claudio Augusto Connor, Kristin L. |
| author2_role |
author author author author author author author author |
| dc.subject.none.fl_str_mv |
MACHINE LEARNING TEXT MINING DEVELOPMENTAL ORIGINS OF HEALTH AND DISEASE |
| topic |
MACHINE LEARNING TEXT MINING DEVELOPMENTAL ORIGINS OF HEALTH AND DISEASE |
| purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
| dc.description.none.fl_txt_mv |
The Developmental Origins of Health and Disease (DOHaD) framework aims to understand how early life exposures shape lifecycle health. To date, no comprehensive list of these exposures and their interactions has been developed, which limits our ability to predict trajectories of risk and resiliency in humans. To address this gap, we developed a model that uses text-mining, machine learning, and natural language processing approaches to automate search, data extraction, and content analysis from DOHaD-related research articles available in PubMed. Our first model captured 2469 articles, which were subsequently categorised into topics based on word frequencies within the titles and abstracts. A manual screening validated 848 of these as relevant, which were used to develop a revised model that finally captured 2098 articles that largely fell under the most prominently researched domains related to our specific DOHaD focus. The articles were clustered according to latent topic extraction, and 23 experts in the field independently labelled the perceived topics. Consensus analysis on this labelling yielded mostly from fair to substantial agreement, which demonstrates that automated models can be developed to successfully retrieve and classify research literature, as a first step to gather evidence related to DOHaD risk and resilience factors that influence later life human health. Fil: Tewari, Shrankhala. Carleton University; Canadá Fil: Toledo Margalef, Pablo Adrian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Centro Nacional Patagónico; Argentina Fil: Kareem, Ayesha. Carleton University; Canadá Fil: Abdul Hussein, Ayah. Carleton University; Canadá Fil: White, Marina. Carleton University; Canadá Fil: Wazana, Ashley. McGill University; Canadá Fil: Davidge, Sandra T.. University of Alberta; Canadá Fil: Delrieux, Claudio Augusto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ingeniería Eléctrica y de Computadoras; Argentina Fil: Connor, Kristin L.. Carleton University; Canadá |
| description |
The Developmental Origins of Health and Disease (DOHaD) framework aims to understand how early life exposures shape lifecycle health. To date, no comprehensive list of these exposures and their interactions has been developed, which limits our ability to predict trajectories of risk and resiliency in humans. To address this gap, we developed a model that uses text-mining, machine learning, and natural language processing approaches to automate search, data extraction, and content analysis from DOHaD-related research articles available in PubMed. Our first model captured 2469 articles, which were subsequently categorised into topics based on word frequencies within the titles and abstracts. A manual screening validated 848 of these as relevant, which were used to develop a revised model that finally captured 2098 articles that largely fell under the most prominently researched domains related to our specific DOHaD focus. The articles were clustered according to latent topic extraction, and 23 experts in the field independently labelled the perceived topics. Consensus analysis on this labelling yielded mostly from fair to substantial agreement, which demonstrates that automated models can be developed to successfully retrieve and classify research literature, as a first step to gather evidence related to DOHaD risk and resilience factors that influence later life human health. |
| publishDate |
2021 |
| dc.date.none.fl_str_mv |
2021-10-22 |
| dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
| format |
article |
| status_str |
publishedVersion |
| dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/159933 Tewari, Shrankhala; Toledo Margalef, Pablo Adrian; Kareem, Ayesha; Abdul Hussein, Ayah; White, Marina; et al.; Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence; Multidisciplinary Digital Publishing Institute; Journal of Personalized Medicine; 11; 11; 22-10-2021; 1-13 2075-4426 CONICET Digital CONICET |
| url |
http://hdl.handle.net/11336/159933 |
| identifier_str_mv |
Tewari, Shrankhala; Toledo Margalef, Pablo Adrian; Kareem, Ayesha; Abdul Hussein, Ayah; White, Marina; et al.; Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence; Multidisciplinary Digital Publishing Institute; Journal of Personalized Medicine; 11; 11; 22-10-2021; 1-13 2075-4426 CONICET Digital CONICET |
| dc.language.none.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://www.mdpi.com/2075-4426/11/11/1064 info:eu-repo/semantics/altIdentifier/doi/10.3390/jpm11111064 |
| dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
| eu_rights_str_mv |
openAccess |
| rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
| dc.format.none.fl_str_mv |
application/pdf application/pdf |
| dc.publisher.none.fl_str_mv |
Multidisciplinary Digital Publishing Institute |
| publisher.none.fl_str_mv |
Multidisciplinary Digital Publishing Institute |
| dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
| reponame_str |
CONICET Digital (CONICET) |
| collection |
CONICET Digital (CONICET) |
| instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
| repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
| repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
| _version_ |
1846782604957712384 |
| score |
12.982451 |