The impact of missing data on real morphological phylogenies: Influence of the number and distribution of missing entries

Autores
Prevosti, Francisco Juan; Chemisquy, Maria Amelia
Año de publicación
2010
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Here we explore the effect of missing data in phylogenetic analyses using a large number of real morphological matrices. Different percentages and patterns of missing entries were added to each matrix, and their influence was evaluated by comparing the accuracy and error of most parsimonious trees. The relationships between accuracy and error and different parameters (e.g. the number of taxa and characters, homoplasy, support) were also evaluated. Our findings, based on real matrices, agree with the simulation studies, i.e. the negative effect increases with the percentage of missing entries, and decreases with the addition of more characters. This indicates that the main problem is the lack of information, not just the presence of missing data per se. Accuracy varies with different distribution patterns of missing entries; the worst case is when missing data are concentrated in a few taxa, while the best is when the missing entries are restricted to just a few characters. The results expand our knowledge of the missing data problem, corroborate many of the findings previously published using simulations, and could be useful for empirical or theoretical studies.
Fil: Prevosti, Francisco Juan. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Museo Argentino de Ciencias Naturales “Bernardino Rivadavia”; Argentina
Fil: Chemisquy, Maria Amelia. Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Botánica Darwinion. Academia Nacional de Ciencias Exactas, Físicas y Naturales. Instituto de Botánica Darwinion; Argentina
Materia
Missing Data
Morphological Phylogenies
Parsimony
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/69010

id CONICETDig_ed9b8cbe7109d62bc99bea2dc222bbc2
oai_identifier_str oai:ri.conicet.gov.ar:11336/69010
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling The impact of missing data on real morphological phylogenies: Influence of the number and distribution of missing entriesPrevosti, Francisco JuanChemisquy, Maria AmeliaMissing DataMorphological PhylogeniesParsimonyhttps://purl.org/becyt/ford/1.6https://purl.org/becyt/ford/1Here we explore the effect of missing data in phylogenetic analyses using a large number of real morphological matrices. Different percentages and patterns of missing entries were added to each matrix, and their influence was evaluated by comparing the accuracy and error of most parsimonious trees. The relationships between accuracy and error and different parameters (e.g. the number of taxa and characters, homoplasy, support) were also evaluated. Our findings, based on real matrices, agree with the simulation studies, i.e. the negative effect increases with the percentage of missing entries, and decreases with the addition of more characters. This indicates that the main problem is the lack of information, not just the presence of missing data per se. Accuracy varies with different distribution patterns of missing entries; the worst case is when missing data are concentrated in a few taxa, while the best is when the missing entries are restricted to just a few characters. The results expand our knowledge of the missing data problem, corroborate many of the findings previously published using simulations, and could be useful for empirical or theoretical studies.Fil: Prevosti, Francisco Juan. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Museo Argentino de Ciencias Naturales “Bernardino Rivadavia”; ArgentinaFil: Chemisquy, Maria Amelia. Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Botánica Darwinion. Academia Nacional de Ciencias Exactas, Físicas y Naturales. Instituto de Botánica Darwinion; ArgentinaWiley Blackwell Publishing, Inc2010-06info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/69010Prevosti, Francisco Juan; Chemisquy, Maria Amelia; The impact of missing data on real morphological phylogenies: Influence of the number and distribution of missing entries; Wiley Blackwell Publishing, Inc; Cladistics; 26; 3; 6-2010; 326-3390748-3007CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1111/j.1096-0031.2009.00289.xinfo:eu-repo/semantics/altIdentifier/url/https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1096-0031.2009.00289.xinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:57:50Zoai:ri.conicet.gov.ar:11336/69010instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:57:50.72CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv The impact of missing data on real morphological phylogenies: Influence of the number and distribution of missing entries
title The impact of missing data on real morphological phylogenies: Influence of the number and distribution of missing entries
spellingShingle The impact of missing data on real morphological phylogenies: Influence of the number and distribution of missing entries
Prevosti, Francisco Juan
Missing Data
Morphological Phylogenies
Parsimony
title_short The impact of missing data on real morphological phylogenies: Influence of the number and distribution of missing entries
title_full The impact of missing data on real morphological phylogenies: Influence of the number and distribution of missing entries
title_fullStr The impact of missing data on real morphological phylogenies: Influence of the number and distribution of missing entries
title_full_unstemmed The impact of missing data on real morphological phylogenies: Influence of the number and distribution of missing entries
title_sort The impact of missing data on real morphological phylogenies: Influence of the number and distribution of missing entries
dc.creator.none.fl_str_mv Prevosti, Francisco Juan
Chemisquy, Maria Amelia
author Prevosti, Francisco Juan
author_facet Prevosti, Francisco Juan
Chemisquy, Maria Amelia
author_role author
author2 Chemisquy, Maria Amelia
author2_role author
dc.subject.none.fl_str_mv Missing Data
Morphological Phylogenies
Parsimony
topic Missing Data
Morphological Phylogenies
Parsimony
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.6
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Here we explore the effect of missing data in phylogenetic analyses using a large number of real morphological matrices. Different percentages and patterns of missing entries were added to each matrix, and their influence was evaluated by comparing the accuracy and error of most parsimonious trees. The relationships between accuracy and error and different parameters (e.g. the number of taxa and characters, homoplasy, support) were also evaluated. Our findings, based on real matrices, agree with the simulation studies, i.e. the negative effect increases with the percentage of missing entries, and decreases with the addition of more characters. This indicates that the main problem is the lack of information, not just the presence of missing data per se. Accuracy varies with different distribution patterns of missing entries; the worst case is when missing data are concentrated in a few taxa, while the best is when the missing entries are restricted to just a few characters. The results expand our knowledge of the missing data problem, corroborate many of the findings previously published using simulations, and could be useful for empirical or theoretical studies.
Fil: Prevosti, Francisco Juan. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Museo Argentino de Ciencias Naturales “Bernardino Rivadavia”; Argentina
Fil: Chemisquy, Maria Amelia. Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Botánica Darwinion. Academia Nacional de Ciencias Exactas, Físicas y Naturales. Instituto de Botánica Darwinion; Argentina
description Here we explore the effect of missing data in phylogenetic analyses using a large number of real morphological matrices. Different percentages and patterns of missing entries were added to each matrix, and their influence was evaluated by comparing the accuracy and error of most parsimonious trees. The relationships between accuracy and error and different parameters (e.g. the number of taxa and characters, homoplasy, support) were also evaluated. Our findings, based on real matrices, agree with the simulation studies, i.e. the negative effect increases with the percentage of missing entries, and decreases with the addition of more characters. This indicates that the main problem is the lack of information, not just the presence of missing data per se. Accuracy varies with different distribution patterns of missing entries; the worst case is when missing data are concentrated in a few taxa, while the best is when the missing entries are restricted to just a few characters. The results expand our knowledge of the missing data problem, corroborate many of the findings previously published using simulations, and could be useful for empirical or theoretical studies.
publishDate 2010
dc.date.none.fl_str_mv 2010-06
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/69010
Prevosti, Francisco Juan; Chemisquy, Maria Amelia; The impact of missing data on real morphological phylogenies: Influence of the number and distribution of missing entries; Wiley Blackwell Publishing, Inc; Cladistics; 26; 3; 6-2010; 326-339
0748-3007
CONICET Digital
CONICET
url http://hdl.handle.net/11336/69010
identifier_str_mv Prevosti, Francisco Juan; Chemisquy, Maria Amelia; The impact of missing data on real morphological phylogenies: Influence of the number and distribution of missing entries; Wiley Blackwell Publishing, Inc; Cladistics; 26; 3; 6-2010; 326-339
0748-3007
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/10.1111/j.1096-0031.2009.00289.x
info:eu-repo/semantics/altIdentifier/url/https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1096-0031.2009.00289.x
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Wiley Blackwell Publishing, Inc
publisher.none.fl_str_mv Wiley Blackwell Publishing, Inc
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844613727656083456
score 13.070432