Hide and vanish: data sets where the most parsimonious tree is known but hard to find, and their implications for tree search methods

Autores: Goloboff, Pablo Augusto
Año de publicación: 2014
Idioma: inglés
Tipo de recurso: artículo
Estado: versión publicada
Descripción: Three different types of data sets, for which the uniquely most parsimonious tree can be known exactly but is hard to find with heuristic tree search methods, are studied. Tree searches are complicated more by the shape of the tree landscape (i.e. the distribution of homoplasy on different trees) than by the sheer abundance of homoplasy or character conflict. Data sets of Type 1 are those constructed by Radel et al. (2013). Data sets of Type 2 present a very rugged landscape, with narrow peaks and valleys, but relatively low amounts of homoplasy. For such a tree landscape, subjecting the trees to TBR and saving suboptimal trees produces much better results when the sequence of clipping for the tree branches is randomized instead of fixed. An unexpected finding for data sets of Types 1 and 2 is that starting a search from a random tree instead of a random addition sequence Wagner tree may increase the probability that the search finds the most parsimonious tree; a small artificial example where these probabilities can be calculated exactly is presented. Data sets of Type 3, the most difficult data sets studied here, comprise only congruent characters, and a single island with only one most parsimonious tree. Even if there is a single island, missing entries create a very flat landscape which is difficult to traverse with tree search algorithms because the number of equally parsimonious trees that need to be saved and swapped to effectively move around the plateaus is too large. Minor modifications of the parameters of tree drifting, ratchet, and sectorial searches allow travelling around these plateaus much more efficiently than saving and swapping large numbers of equally parsimonious trees with TBR. For these data sets, two new related criteria for selecting taxon addition sequences in Wagner trees (the “selected” and “informative” addition sequences) produce much better results than the standard random or closest addition sequences. These new methods for Wagner trees and for moving around plateaus can be useful when analyzing phylogenomic data sets formed by concatenation of genes with uneven taxon representation (“sparse” supermatrices), which are likely to present a tree landscape with extensive plateaus.
Fil: Goloboff, Pablo Augusto. Fundación Miguel Lillo; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Tucumán; Argentina
Materia: Phylogeny
Tree Searches
Parsimony
Nivel de accesibilidad: acceso abierto
Condiciones de uso: https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
Repositorio
Institución: Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador: oai:ri.conicet.gov.ar:11336/7262

Acceder

id	CONICETDig_e9c07b1073dbc0a988171aa2229d0270
oai_identifier_str	oai:ri.conicet.gov.ar:11336/7262
network_acronym_str	CONICETDig
repository_id_str	3498
network_name_str	CONICET Digital (CONICET)
spelling	Hide and vanish: data sets where the most parsimonious tree is known but hard to find, and their implications for tree search methodsGoloboff, Pablo AugustoPhylogenyTree SearchesParsimonyhttps://purl.org/becyt/ford/1.6https://purl.org/becyt/ford/1Three different types of data sets, for which the uniquely most parsimonious tree can be known exactly but is hard to find with heuristic tree search methods, are studied. Tree searches are complicated more by the shape of the tree landscape (i.e. the distribution of homoplasy on different trees) than by the sheer abundance of homoplasy or character conflict. Data sets of Type 1 are those constructed by Radel et al. (2013). Data sets of Type 2 present a very rugged landscape, with narrow peaks and valleys, but relatively low amounts of homoplasy. For such a tree landscape, subjecting the trees to TBR and saving suboptimal trees produces much better results when the sequence of clipping for the tree branches is randomized instead of fixed. An unexpected finding for data sets of Types 1 and 2 is that starting a search from a random tree instead of a random addition sequence Wagner tree may increase the probability that the search finds the most parsimonious tree; a small artificial example where these probabilities can be calculated exactly is presented. Data sets of Type 3, the most difficult data sets studied here, comprise only congruent characters, and a single island with only one most parsimonious tree. Even if there is a single island, missing entries create a very flat landscape which is difficult to traverse with tree search algorithms because the number of equally parsimonious trees that need to be saved and swapped to effectively move around the plateaus is too large. Minor modifications of the parameters of tree drifting, ratchet, and sectorial searches allow travelling around these plateaus much more efficiently than saving and swapping large numbers of equally parsimonious trees with TBR. For these data sets, two new related criteria for selecting taxon addition sequences in Wagner trees (the “selected” and “informative” addition sequences) produce much better results than the standard random or closest addition sequences. These new methods for Wagner trees and for moving around plateaus can be useful when analyzing phylogenomic data sets formed by concatenation of genes with uneven taxon representation (“sparse” supermatrices), which are likely to present a tree landscape with extensive plateaus.Fil: Goloboff, Pablo Augusto. Fundación Miguel Lillo; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Tucumán; ArgentinaElsevier2014-06info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/7262Goloboff, Pablo Augusto; Hide and vanish: data sets where the most parsimonious tree is known but hard to find, and their implications for tree search methods; Elsevier; Molecular Phylogenetics And Evolution; 79; 6-2014; 118-1311055-7903enginfo:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S1055790314002218info:eu-repo/semantics/altIdentifier/doi/10.1016/j.ympev.2014.06.008info:eu-repo/semantics/altIdentifier/doi/info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-nd/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2026-06-04T11:11:25Zoai:ri.conicet.gov.ar:11336/7262instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982026-06-04 11:11:26.343CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv	Hide and vanish: data sets where the most parsimonious tree is known but hard to find, and their implications for tree search methods
title	Hide and vanish: data sets where the most parsimonious tree is known but hard to find, and their implications for tree search methods
spellingShingle	Hide and vanish: data sets where the most parsimonious tree is known but hard to find, and their implications for tree search methods Goloboff, Pablo Augusto Phylogeny Tree Searches Parsimony
title_short	Hide and vanish: data sets where the most parsimonious tree is known but hard to find, and their implications for tree search methods
title_full	Hide and vanish: data sets where the most parsimonious tree is known but hard to find, and their implications for tree search methods
title_fullStr	Hide and vanish: data sets where the most parsimonious tree is known but hard to find, and their implications for tree search methods
title_full_unstemmed	Hide and vanish: data sets where the most parsimonious tree is known but hard to find, and their implications for tree search methods
title_sort	Hide and vanish: data sets where the most parsimonious tree is known but hard to find, and their implications for tree search methods
dc.creator.none.fl_str_mv	Goloboff, Pablo Augusto
author	Goloboff, Pablo Augusto
author_facet	Goloboff, Pablo Augusto
author_role	author
dc.subject.none.fl_str_mv	Phylogeny Tree Searches Parsimony
topic	Phylogeny Tree Searches Parsimony
purl_subject.fl_str_mv	https://purl.org/becyt/ford/1.6 https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv	Three different types of data sets, for which the uniquely most parsimonious tree can be known exactly but is hard to find with heuristic tree search methods, are studied. Tree searches are complicated more by the shape of the tree landscape (i.e. the distribution of homoplasy on different trees) than by the sheer abundance of homoplasy or character conflict. Data sets of Type 1 are those constructed by Radel et al. (2013). Data sets of Type 2 present a very rugged landscape, with narrow peaks and valleys, but relatively low amounts of homoplasy. For such a tree landscape, subjecting the trees to TBR and saving suboptimal trees produces much better results when the sequence of clipping for the tree branches is randomized instead of fixed. An unexpected finding for data sets of Types 1 and 2 is that starting a search from a random tree instead of a random addition sequence Wagner tree may increase the probability that the search finds the most parsimonious tree; a small artificial example where these probabilities can be calculated exactly is presented. Data sets of Type 3, the most difficult data sets studied here, comprise only congruent characters, and a single island with only one most parsimonious tree. Even if there is a single island, missing entries create a very flat landscape which is difficult to traverse with tree search algorithms because the number of equally parsimonious trees that need to be saved and swapped to effectively move around the plateaus is too large. Minor modifications of the parameters of tree drifting, ratchet, and sectorial searches allow travelling around these plateaus much more efficiently than saving and swapping large numbers of equally parsimonious trees with TBR. For these data sets, two new related criteria for selecting taxon addition sequences in Wagner trees (the “selected” and “informative” addition sequences) produce much better results than the standard random or closest addition sequences. These new methods for Wagner trees and for moving around plateaus can be useful when analyzing phylogenomic data sets formed by concatenation of genes with uneven taxon representation (“sparse” supermatrices), which are likely to present a tree landscape with extensive plateaus. Fil: Goloboff, Pablo Augusto. Fundación Miguel Lillo; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Tucumán; Argentina
description	Three different types of data sets, for which the uniquely most parsimonious tree can be known exactly but is hard to find with heuristic tree search methods, are studied. Tree searches are complicated more by the shape of the tree landscape (i.e. the distribution of homoplasy on different trees) than by the sheer abundance of homoplasy or character conflict. Data sets of Type 1 are those constructed by Radel et al. (2013). Data sets of Type 2 present a very rugged landscape, with narrow peaks and valleys, but relatively low amounts of homoplasy. For such a tree landscape, subjecting the trees to TBR and saving suboptimal trees produces much better results when the sequence of clipping for the tree branches is randomized instead of fixed. An unexpected finding for data sets of Types 1 and 2 is that starting a search from a random tree instead of a random addition sequence Wagner tree may increase the probability that the search finds the most parsimonious tree; a small artificial example where these probabilities can be calculated exactly is presented. Data sets of Type 3, the most difficult data sets studied here, comprise only congruent characters, and a single island with only one most parsimonious tree. Even if there is a single island, missing entries create a very flat landscape which is difficult to traverse with tree search algorithms because the number of equally parsimonious trees that need to be saved and swapped to effectively move around the plateaus is too large. Minor modifications of the parameters of tree drifting, ratchet, and sectorial searches allow travelling around these plateaus much more efficiently than saving and swapping large numbers of equally parsimonious trees with TBR. For these data sets, two new related criteria for selecting taxon addition sequences in Wagner trees (the “selected” and “informative” addition sequences) produce much better results than the standard random or closest addition sequences. These new methods for Wagner trees and for moving around plateaus can be useful when analyzing phylogenomic data sets formed by concatenation of genes with uneven taxon representation (“sparse” supermatrices), which are likely to present a tree landscape with extensive plateaus.
publishDate	2014
dc.date.none.fl_str_mv	2014-06
dc.type.none.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo
format	article
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://hdl.handle.net/11336/7262 Goloboff, Pablo Augusto; Hide and vanish: data sets where the most parsimonious tree is known but hard to find, and their implications for tree search methods; Elsevier; Molecular Phylogenetics And Evolution; 79; 6-2014; 118-131 1055-7903
url	http://hdl.handle.net/11336/7262
identifier_str_mv	Goloboff, Pablo Augusto; Hide and vanish: data sets where the most parsimonious tree is known but hard to find, and their implications for tree search methods; Elsevier; Molecular Phylogenetics And Evolution; 79; 6-2014; 118-131 1055-7903
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S1055790314002218 info:eu-repo/semantics/altIdentifier/doi/10.1016/j.ympev.2014.06.008 info:eu-repo/semantics/altIdentifier/doi/
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
eu_rights_str_mv	openAccess
rights_invalid_str_mv	https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
dc.format.none.fl_str_mv	application/pdf application/pdf
dc.publisher.none.fl_str_mv	Elsevier
publisher.none.fl_str_mv	Elsevier
dc.source.none.fl_str_mv	reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str	CONICET Digital (CONICET)
collection	CONICET Digital (CONICET)
instname_str	Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv	CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv	dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_	1867099236686888960
score	12.832306

Hide and vanish: data sets where the most parsimonious tree is known but hard to find, and their implications for tree search methods

Publicaciones similares