Handling missing values in trait data
- Autores
- Johnson, Thomas F.; Isaac, Nick J. B.; Paviolo, Agustin Javier; González Suárez, Manuela
- Año de publicación
- 2021
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Aim: Trait data are widely used in ecological and evolutionary phylogenetic comparative studies, but often values are not available for all species of interest. Traditionally, researchers have excluded species without data from analyses, but estimation of missing values using imputation has been proposed as a better approach. However, imputation methods have largely been designed for randomly missing data, whereas trait data are often not missing at random (e.g., more data for bigger species). Here, we evaluate the performance of approaches for handling missing values when considering biased datasets. Location: Any. Time period: Any. Major taxa studied: Any. Methods: We simulated continuous traits and separate response variables to test the performance of nine imputation methods and complete-case analysis (excluding missing values from the dataset) under biased missing data scenarios. We characterized performance by estimating the error in imputed trait values (deviation from the true value) and inferred trait–response relationships (deviation from the true relationship between a trait and response). Results: Generally, Rphylopars imputation produced the most accurate estimate of missing values and best preserved the response–trait slope. However, estimates of missing data were still inaccurate, even with only 5% of values missing. Under severe biases, errors were high with every approach. Imputation was not always the best option, with complete-case analysis frequently outperforming Mice imputation and, to a lesser degree, BHPMF imputation. Mice, a popular approach, performed poorly when the response variable was excluded from the imputation model. Main conclusions: Imputation can handle missing data effectively in some conditions but is not always the best solution. None of the methods we tested could deal effectively with severe biases, which can be common in trait datasets. We recommend rigorous data checking for biases before and after imputation and propose variables that can assist researchers working with incomplete datasets to detect data biases and minimize errors.
Fil: Johnson, Thomas F.. University of Reading; Reino Unido
Fil: Isaac, Nick J. B.. Centre For Ecology And Hydrology; Reino Unido
Fil: Paviolo, Agustin Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste. Instituto de Biología Subtropical. Instituto de Biología Subtropical - Nodo Puerto Iguazú | Universidad Nacional de Misiones. Instituto de Biología Subtropical. Instituto de Biología Subtropical - Nodo Puerto Iguazú; Argentina. Centro de Investigaciones del Bosque Atlántico; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste; Argentina
Fil: González Suárez, Manuela. University of Reading; Reino Unido - Materia
-
BHPMF
FUNCTIONAL TRAIT
IMPUTATION
LIFE-HISTORY TRAIT
MAR
MCAR
MISSING DATA
MNAR
MULTIPLE IMPUTATION CHAINED EQUATIONS
RPHYLOPARS - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/168014
Ver los metadatos del registro completo
id |
CONICETDig_ad16cc811b8a0c3ce9e76c58cacbbb8b |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/168014 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Handling missing values in trait dataJohnson, Thomas F.Isaac, Nick J. B.Paviolo, Agustin JavierGonzález Suárez, ManuelaBHPMFFUNCTIONAL TRAITIMPUTATIONLIFE-HISTORY TRAITMARMCARMISSING DATAMNARMULTIPLE IMPUTATION CHAINED EQUATIONSRPHYLOPARShttps://purl.org/becyt/ford/1.6https://purl.org/becyt/ford/1Aim: Trait data are widely used in ecological and evolutionary phylogenetic comparative studies, but often values are not available for all species of interest. Traditionally, researchers have excluded species without data from analyses, but estimation of missing values using imputation has been proposed as a better approach. However, imputation methods have largely been designed for randomly missing data, whereas trait data are often not missing at random (e.g., more data for bigger species). Here, we evaluate the performance of approaches for handling missing values when considering biased datasets. Location: Any. Time period: Any. Major taxa studied: Any. Methods: We simulated continuous traits and separate response variables to test the performance of nine imputation methods and complete-case analysis (excluding missing values from the dataset) under biased missing data scenarios. We characterized performance by estimating the error in imputed trait values (deviation from the true value) and inferred trait–response relationships (deviation from the true relationship between a trait and response). Results: Generally, Rphylopars imputation produced the most accurate estimate of missing values and best preserved the response–trait slope. However, estimates of missing data were still inaccurate, even with only 5% of values missing. Under severe biases, errors were high with every approach. Imputation was not always the best option, with complete-case analysis frequently outperforming Mice imputation and, to a lesser degree, BHPMF imputation. Mice, a popular approach, performed poorly when the response variable was excluded from the imputation model. Main conclusions: Imputation can handle missing data effectively in some conditions but is not always the best solution. None of the methods we tested could deal effectively with severe biases, which can be common in trait datasets. We recommend rigorous data checking for biases before and after imputation and propose variables that can assist researchers working with incomplete datasets to detect data biases and minimize errors.Fil: Johnson, Thomas F.. University of Reading; Reino UnidoFil: Isaac, Nick J. B.. Centre For Ecology And Hydrology; Reino UnidoFil: Paviolo, Agustin Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste. Instituto de Biología Subtropical. Instituto de Biología Subtropical - Nodo Puerto Iguazú | Universidad Nacional de Misiones. Instituto de Biología Subtropical. Instituto de Biología Subtropical - Nodo Puerto Iguazú; Argentina. Centro de Investigaciones del Bosque Atlántico; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste; ArgentinaFil: González Suárez, Manuela. University of Reading; Reino UnidoWiley Blackwell Publishing, Inc2021-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/168014Johnson, Thomas F.; Isaac, Nick J. B.; Paviolo, Agustin Javier; González Suárez, Manuela; Handling missing values in trait data; Wiley Blackwell Publishing, Inc; Global Ecology and Biogeography; 30; 1; 1-2021; 51-621466-822XCONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1111/geb.13185info:eu-repo/semantics/altIdentifier/url/https://onlinelibrary.wiley.com/doi/10.1111/geb.13185info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:57:10Zoai:ri.conicet.gov.ar:11336/168014instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:57:10.739CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Handling missing values in trait data |
title |
Handling missing values in trait data |
spellingShingle |
Handling missing values in trait data Johnson, Thomas F. BHPMF FUNCTIONAL TRAIT IMPUTATION LIFE-HISTORY TRAIT MAR MCAR MISSING DATA MNAR MULTIPLE IMPUTATION CHAINED EQUATIONS RPHYLOPARS |
title_short |
Handling missing values in trait data |
title_full |
Handling missing values in trait data |
title_fullStr |
Handling missing values in trait data |
title_full_unstemmed |
Handling missing values in trait data |
title_sort |
Handling missing values in trait data |
dc.creator.none.fl_str_mv |
Johnson, Thomas F. Isaac, Nick J. B. Paviolo, Agustin Javier González Suárez, Manuela |
author |
Johnson, Thomas F. |
author_facet |
Johnson, Thomas F. Isaac, Nick J. B. Paviolo, Agustin Javier González Suárez, Manuela |
author_role |
author |
author2 |
Isaac, Nick J. B. Paviolo, Agustin Javier González Suárez, Manuela |
author2_role |
author author author |
dc.subject.none.fl_str_mv |
BHPMF FUNCTIONAL TRAIT IMPUTATION LIFE-HISTORY TRAIT MAR MCAR MISSING DATA MNAR MULTIPLE IMPUTATION CHAINED EQUATIONS RPHYLOPARS |
topic |
BHPMF FUNCTIONAL TRAIT IMPUTATION LIFE-HISTORY TRAIT MAR MCAR MISSING DATA MNAR MULTIPLE IMPUTATION CHAINED EQUATIONS RPHYLOPARS |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.6 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
Aim: Trait data are widely used in ecological and evolutionary phylogenetic comparative studies, but often values are not available for all species of interest. Traditionally, researchers have excluded species without data from analyses, but estimation of missing values using imputation has been proposed as a better approach. However, imputation methods have largely been designed for randomly missing data, whereas trait data are often not missing at random (e.g., more data for bigger species). Here, we evaluate the performance of approaches for handling missing values when considering biased datasets. Location: Any. Time period: Any. Major taxa studied: Any. Methods: We simulated continuous traits and separate response variables to test the performance of nine imputation methods and complete-case analysis (excluding missing values from the dataset) under biased missing data scenarios. We characterized performance by estimating the error in imputed trait values (deviation from the true value) and inferred trait–response relationships (deviation from the true relationship between a trait and response). Results: Generally, Rphylopars imputation produced the most accurate estimate of missing values and best preserved the response–trait slope. However, estimates of missing data were still inaccurate, even with only 5% of values missing. Under severe biases, errors were high with every approach. Imputation was not always the best option, with complete-case analysis frequently outperforming Mice imputation and, to a lesser degree, BHPMF imputation. Mice, a popular approach, performed poorly when the response variable was excluded from the imputation model. Main conclusions: Imputation can handle missing data effectively in some conditions but is not always the best solution. None of the methods we tested could deal effectively with severe biases, which can be common in trait datasets. We recommend rigorous data checking for biases before and after imputation and propose variables that can assist researchers working with incomplete datasets to detect data biases and minimize errors. Fil: Johnson, Thomas F.. University of Reading; Reino Unido Fil: Isaac, Nick J. B.. Centre For Ecology And Hydrology; Reino Unido Fil: Paviolo, Agustin Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste. Instituto de Biología Subtropical. Instituto de Biología Subtropical - Nodo Puerto Iguazú | Universidad Nacional de Misiones. Instituto de Biología Subtropical. Instituto de Biología Subtropical - Nodo Puerto Iguazú; Argentina. Centro de Investigaciones del Bosque Atlántico; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste; Argentina Fil: González Suárez, Manuela. University of Reading; Reino Unido |
description |
Aim: Trait data are widely used in ecological and evolutionary phylogenetic comparative studies, but often values are not available for all species of interest. Traditionally, researchers have excluded species without data from analyses, but estimation of missing values using imputation has been proposed as a better approach. However, imputation methods have largely been designed for randomly missing data, whereas trait data are often not missing at random (e.g., more data for bigger species). Here, we evaluate the performance of approaches for handling missing values when considering biased datasets. Location: Any. Time period: Any. Major taxa studied: Any. Methods: We simulated continuous traits and separate response variables to test the performance of nine imputation methods and complete-case analysis (excluding missing values from the dataset) under biased missing data scenarios. We characterized performance by estimating the error in imputed trait values (deviation from the true value) and inferred trait–response relationships (deviation from the true relationship between a trait and response). Results: Generally, Rphylopars imputation produced the most accurate estimate of missing values and best preserved the response–trait slope. However, estimates of missing data were still inaccurate, even with only 5% of values missing. Under severe biases, errors were high with every approach. Imputation was not always the best option, with complete-case analysis frequently outperforming Mice imputation and, to a lesser degree, BHPMF imputation. Mice, a popular approach, performed poorly when the response variable was excluded from the imputation model. Main conclusions: Imputation can handle missing data effectively in some conditions but is not always the best solution. None of the methods we tested could deal effectively with severe biases, which can be common in trait datasets. We recommend rigorous data checking for biases before and after imputation and propose variables that can assist researchers working with incomplete datasets to detect data biases and minimize errors. |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021-01 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/168014 Johnson, Thomas F.; Isaac, Nick J. B.; Paviolo, Agustin Javier; González Suárez, Manuela; Handling missing values in trait data; Wiley Blackwell Publishing, Inc; Global Ecology and Biogeography; 30; 1; 1-2021; 51-62 1466-822X CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/168014 |
identifier_str_mv |
Johnson, Thomas F.; Isaac, Nick J. B.; Paviolo, Agustin Javier; González Suárez, Manuela; Handling missing values in trait data; Wiley Blackwell Publishing, Inc; Global Ecology and Biogeography; 30; 1; 1-2021; 51-62 1466-822X CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/doi/10.1111/geb.13185 info:eu-repo/semantics/altIdentifier/url/https://onlinelibrary.wiley.com/doi/10.1111/geb.13185 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Wiley Blackwell Publishing, Inc |
publisher.none.fl_str_mv |
Wiley Blackwell Publishing, Inc |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844613712330096640 |
score |
13.070432 |