Handling missing values in trait data

Autores: Johnson, Thomas F.; Isaac, Nick J. B.; Paviolo, Agustin Javier; González Suárez, Manuela
Año de publicación: 2021
Idioma: inglés
Tipo de recurso: artículo
Estado: versión publicada
Descripción: Aim: Trait data are widely used in ecological and evolutionary phylogenetic comparative studies, but often values are not available for all species of interest. Traditionally, researchers have excluded species without data from analyses, but estimation of missing values using imputation has been proposed as a better approach. However, imputation methods have largely been designed for randomly missing data, whereas trait data are often not missing at random (e.g., more data for bigger species). Here, we evaluate the performance of approaches for handling missing values when considering biased datasets. Location: Any. Time period: Any. Major taxa studied: Any. Methods: We simulated continuous traits and separate response variables to test the performance of nine imputation methods and complete-case analysis (excluding missing values from the dataset) under biased missing data scenarios. We characterized performance by estimating the error in imputed trait values (deviation from the true value) and inferred trait–response relationships (deviation from the true relationship between a trait and response). Results: Generally, Rphylopars imputation produced the most accurate estimate of missing values and best preserved the response–trait slope. However, estimates of missing data were still inaccurate, even with only 5% of values missing. Under severe biases, errors were high with every approach. Imputation was not always the best option, with complete-case analysis frequently outperforming Mice imputation and, to a lesser degree, BHPMF imputation. Mice, a popular approach, performed poorly when the response variable was excluded from the imputation model. Main conclusions: Imputation can handle missing data effectively in some conditions but is not always the best solution. None of the methods we tested could deal effectively with severe biases, which can be common in trait datasets. We recommend rigorous data checking for biases before and after imputation and propose variables that can assist researchers working with incomplete datasets to detect data biases and minimize errors.
Fil: Johnson, Thomas F.. University of Reading; Reino Unido
Fil: Isaac, Nick J. B.. Centre For Ecology And Hydrology; Reino Unido
Fil: Paviolo, Agustin Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste. Instituto de Biología Subtropical. Instituto de Biología Subtropical - Nodo Puerto Iguazú | Universidad Nacional de Misiones. Instituto de Biología Subtropical. Instituto de Biología Subtropical - Nodo Puerto Iguazú; Argentina. Centro de Investigaciones del Bosque Atlántico; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste; Argentina
Fil: González Suárez, Manuela. University of Reading; Reino Unido
Materia: BHPMF
FUNCTIONAL TRAIT
IMPUTATION
LIFE-HISTORY TRAIT
MAR
MCAR
MISSING DATA
MNAR
MULTIPLE IMPUTATION CHAINED EQUATIONS
RPHYLOPARS
Nivel de accesibilidad: acceso abierto
Condiciones de uso: https://creativecommons.org/licenses/by/2.5/ar/
Repositorio
Institución: Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador: oai:ri.conicet.gov.ar:11336/168014

Acceder

id	CONICETDig_ad16cc811b8a0c3ce9e76c58cacbbb8b
oai_identifier_str	oai:ri.conicet.gov.ar:11336/168014
network_acronym_str	CONICETDig
repository_id_str	3498
network_name_str	CONICET Digital (CONICET)
spelling	Handling missing values in trait dataJohnson, Thomas F.Isaac, Nick J. B.Paviolo, Agustin JavierGonzález Suárez, ManuelaBHPMFFUNCTIONAL TRAITIMPUTATIONLIFE-HISTORY TRAITMARMCARMISSING DATAMNARMULTIPLE IMPUTATION CHAINED EQUATIONSRPHYLOPARShttps://purl.org/becyt/ford/1.6https://purl.org/becyt/ford/1Aim: Trait data are widely used in ecological and evolutionary phylogenetic comparative studies, but often values are not available for all species of interest. Traditionally, researchers have excluded species without data from analyses, but estimation of missing values using imputation has been proposed as a better approach. However, imputation methods have largely been designed for randomly missing data, whereas trait data are often not missing at random (e.g., more data for bigger species). Here, we evaluate the performance of approaches for handling missing values when considering biased datasets. Location: Any. Time period: Any. Major taxa studied: Any. Methods: We simulated continuous traits and separate response variables to test the performance of nine imputation methods and complete-case analysis (excluding missing values from the dataset) under biased missing data scenarios. We characterized performance by estimating the error in imputed trait values (deviation from the true value) and inferred trait–response relationships (deviation from the true relationship between a trait and response). Results: Generally, Rphylopars imputation produced the most accurate estimate of missing values and best preserved the response–trait slope. However, estimates of missing data were still inaccurate, even with only 5% of values missing. Under severe biases, errors were high with every approach. Imputation was not always the best option, with complete-case analysis frequently outperforming Mice imputation and, to a lesser degree, BHPMF imputation. Mice, a popular approach, performed poorly when the response variable was excluded from the imputation model. Main conclusions: Imputation can handle missing data effectively in some conditions but is not always the best solution. None of the methods we tested could deal effectively with severe biases, which can be common in trait datasets. We recommend rigorous data checking for biases before and after imputation and propose variables that can assist researchers working with incomplete datasets to detect data biases and minimize errors.Fil: Johnson, Thomas F.. University of Reading; Reino UnidoFil: Isaac, Nick J. B.. Centre For Ecology And Hydrology; Reino UnidoFil: Paviolo, Agustin Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste. Instituto de Biología Subtropical. Instituto de Biología Subtropical - Nodo Puerto Iguazú \| Universidad Nacional de Misiones. Instituto de Biología Subtropical. Instituto de Biología Subtropical - Nodo Puerto Iguazú; Argentina. Centro de Investigaciones del Bosque Atlántico; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste; ArgentinaFil: González Suárez, Manuela. University of Reading; Reino UnidoWiley Blackwell Publishing, Inc2021-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/168014Johnson, Thomas F.; Isaac, Nick J. B.; Paviolo, Agustin Javier; González Suárez, Manuela; Handling missing values in trait data; Wiley Blackwell Publishing, Inc; Global Ecology and Biogeography; 30; 1; 1-2021; 51-621466-822XCONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1111/geb.13185info:eu-repo/semantics/altIdentifier/url/https://onlinelibrary.wiley.com/doi/10.1111/geb.13185info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2026-02-26T10:10:04Zoai:ri.conicet.gov.ar:11336/168014instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982026-02-26 10:10:04.381CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv	Handling missing values in trait data
title	Handling missing values in trait data
spellingShingle	Handling missing values in trait data Johnson, Thomas F. BHPMF FUNCTIONAL TRAIT IMPUTATION LIFE-HISTORY TRAIT MAR MCAR MISSING DATA MNAR MULTIPLE IMPUTATION CHAINED EQUATIONS RPHYLOPARS
title_short	Handling missing values in trait data
title_full	Handling missing values in trait data
title_fullStr	Handling missing values in trait data
title_full_unstemmed	Handling missing values in trait data
title_sort	Handling missing values in trait data
dc.creator.none.fl_str_mv	Johnson, Thomas F. Isaac, Nick J. B. Paviolo, Agustin Javier González Suárez, Manuela
author	Johnson, Thomas F.
author_facet	Johnson, Thomas F. Isaac, Nick J. B. Paviolo, Agustin Javier González Suárez, Manuela
author_role	author
author2	Isaac, Nick J. B. Paviolo, Agustin Javier González Suárez, Manuela
author2_role	author author author
dc.subject.none.fl_str_mv	BHPMF FUNCTIONAL TRAIT IMPUTATION LIFE-HISTORY TRAIT MAR MCAR MISSING DATA MNAR MULTIPLE IMPUTATION CHAINED EQUATIONS RPHYLOPARS
topic	BHPMF FUNCTIONAL TRAIT IMPUTATION LIFE-HISTORY TRAIT MAR MCAR MISSING DATA MNAR MULTIPLE IMPUTATION CHAINED EQUATIONS RPHYLOPARS
purl_subject.fl_str_mv	https://purl.org/becyt/ford/1.6 https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv	Aim: Trait data are widely used in ecological and evolutionary phylogenetic comparative studies, but often values are not available for all species of interest. Traditionally, researchers have excluded species without data from analyses, but estimation of missing values using imputation has been proposed as a better approach. However, imputation methods have largely been designed for randomly missing data, whereas trait data are often not missing at random (e.g., more data for bigger species). Here, we evaluate the performance of approaches for handling missing values when considering biased datasets. Location: Any. Time period: Any. Major taxa studied: Any. Methods: We simulated continuous traits and separate response variables to test the performance of nine imputation methods and complete-case analysis (excluding missing values from the dataset) under biased missing data scenarios. We characterized performance by estimating the error in imputed trait values (deviation from the true value) and inferred trait–response relationships (deviation from the true relationship between a trait and response). Results: Generally, Rphylopars imputation produced the most accurate estimate of missing values and best preserved the response–trait slope. However, estimates of missing data were still inaccurate, even with only 5% of values missing. Under severe biases, errors were high with every approach. Imputation was not always the best option, with complete-case analysis frequently outperforming Mice imputation and, to a lesser degree, BHPMF imputation. Mice, a popular approach, performed poorly when the response variable was excluded from the imputation model. Main conclusions: Imputation can handle missing data effectively in some conditions but is not always the best solution. None of the methods we tested could deal effectively with severe biases, which can be common in trait datasets. We recommend rigorous data checking for biases before and after imputation and propose variables that can assist researchers working with incomplete datasets to detect data biases and minimize errors. Fil: Johnson, Thomas F.. University of Reading; Reino Unido Fil: Isaac, Nick J. B.. Centre For Ecology And Hydrology; Reino Unido Fil: Paviolo, Agustin Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste. Instituto de Biología Subtropical. Instituto de Biología Subtropical - Nodo Puerto Iguazú \| Universidad Nacional de Misiones. Instituto de Biología Subtropical. Instituto de Biología Subtropical - Nodo Puerto Iguazú; Argentina. Centro de Investigaciones del Bosque Atlántico; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Nordeste; Argentina Fil: González Suárez, Manuela. University of Reading; Reino Unido
description	Aim: Trait data are widely used in ecological and evolutionary phylogenetic comparative studies, but often values are not available for all species of interest. Traditionally, researchers have excluded species without data from analyses, but estimation of missing values using imputation has been proposed as a better approach. However, imputation methods have largely been designed for randomly missing data, whereas trait data are often not missing at random (e.g., more data for bigger species). Here, we evaluate the performance of approaches for handling missing values when considering biased datasets. Location: Any. Time period: Any. Major taxa studied: Any. Methods: We simulated continuous traits and separate response variables to test the performance of nine imputation methods and complete-case analysis (excluding missing values from the dataset) under biased missing data scenarios. We characterized performance by estimating the error in imputed trait values (deviation from the true value) and inferred trait–response relationships (deviation from the true relationship between a trait and response). Results: Generally, Rphylopars imputation produced the most accurate estimate of missing values and best preserved the response–trait slope. However, estimates of missing data were still inaccurate, even with only 5% of values missing. Under severe biases, errors were high with every approach. Imputation was not always the best option, with complete-case analysis frequently outperforming Mice imputation and, to a lesser degree, BHPMF imputation. Mice, a popular approach, performed poorly when the response variable was excluded from the imputation model. Main conclusions: Imputation can handle missing data effectively in some conditions but is not always the best solution. None of the methods we tested could deal effectively with severe biases, which can be common in trait datasets. We recommend rigorous data checking for biases before and after imputation and propose variables that can assist researchers working with incomplete datasets to detect data biases and minimize errors.
publishDate	2021
dc.date.none.fl_str_mv	2021-01
dc.type.none.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo
format	article
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://hdl.handle.net/11336/168014 Johnson, Thomas F.; Isaac, Nick J. B.; Paviolo, Agustin Javier; González Suárez, Manuela; Handling missing values in trait data; Wiley Blackwell Publishing, Inc; Global Ecology and Biogeography; 30; 1; 1-2021; 51-62 1466-822X CONICET Digital CONICET
url	http://hdl.handle.net/11336/168014
identifier_str_mv	Johnson, Thomas F.; Isaac, Nick J. B.; Paviolo, Agustin Javier; González Suárez, Manuela; Handling missing values in trait data; Wiley Blackwell Publishing, Inc; Global Ecology and Biogeography; 30; 1; 1-2021; 51-62 1466-822X CONICET Digital CONICET
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/doi/10.1111/geb.13185 info:eu-repo/semantics/altIdentifier/url/https://onlinelibrary.wiley.com/doi/10.1111/geb.13185
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by/2.5/ar/
eu_rights_str_mv	openAccess
rights_invalid_str_mv	https://creativecommons.org/licenses/by/2.5/ar/
dc.format.none.fl_str_mv	application/pdf application/pdf
dc.publisher.none.fl_str_mv	Wiley Blackwell Publishing, Inc
publisher.none.fl_str_mv	Wiley Blackwell Publishing, Inc
dc.source.none.fl_str_mv	reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str	CONICET Digital (CONICET)
collection	CONICET Digital (CONICET)
instname_str	Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv	CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv	dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_	1858305403179237376
score	13.176822

Handling missing values in trait data

Publicaciones similares