Comparison of algorithms to infer genetic population structure from unlinked molecular markers

Autores
Peña Malavera, Andrea Natalia; Fernandez, Elmer Andres; Bruno, Cecilia Ines; Balzarini, Monica Graciela
Año de publicación
2014
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high FST) and two numbers of sub-populations (K=3 and K=5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence (FST=0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data.
Fil: Peña Malavera, Andrea Natalia. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Córdoba. Facultad de Ciencias Agropecuarias; Argentina
Fil: Fernandez, Elmer Andres. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Córdoba. Facultad de Ciencias Agropecuarias; Argentina
Fil: Bruno, Cecilia Ines. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Catolica de Córdoba. Facultad de Ingeniería; Argentina
Fil: Balzarini, Monica Graciela. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Córdoba. Facultad de Ciencias Agropecuarias; Argentina
Materia
Cluster Analysis
Multilocus-Biallelic Genotypes
Plant Breeding
Self-Organizing Maps
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/34261

id CONICETDig_597cd8e2f5fae6db6203b027713682d0
oai_identifier_str oai:ri.conicet.gov.ar:11336/34261
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Comparison of algorithms to infer genetic population structure from unlinked molecular markersPeña Malavera, Andrea NataliaFernandez, Elmer AndresBruno, Cecilia InesBalzarini, Monica GracielaCluster AnalysisMultilocus-Biallelic GenotypesPlant BreedingSelf-Organizing Mapshttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high FST) and two numbers of sub-populations (K=3 and K=5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence (FST=0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data.Fil: Peña Malavera, Andrea Natalia. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Córdoba. Facultad de Ciencias Agropecuarias; ArgentinaFil: Fernandez, Elmer Andres. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Córdoba. Facultad de Ciencias Agropecuarias; ArgentinaFil: Bruno, Cecilia Ines. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Catolica de Córdoba. Facultad de Ingeniería; ArgentinaFil: Balzarini, Monica Graciela. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Córdoba. Facultad de Ciencias Agropecuarias; ArgentinaBerkeley Electronic Press2014-06info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/34261Peña Malavera, Andrea Natalia; Fernandez, Elmer Andres; Bruno, Cecilia Ines; Balzarini, Monica Graciela; Comparison of algorithms to infer genetic population structure from unlinked molecular markers; Berkeley Electronic Press; Statistical Applications In Genetics And Molecular Biology; 13; 4; 6-2014; 391-4021544-6115CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1515/sagmb-2013-0006info:eu-repo/semantics/altIdentifier/url/https://www.degruyter.com/view/j/sagmb.2014.13.issue-4/sagmb-2013-0006/sagmb-2013-0006.xmlinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-10-15T14:22:21Zoai:ri.conicet.gov.ar:11336/34261instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-10-15 14:22:21.368CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Comparison of algorithms to infer genetic population structure from unlinked molecular markers
title Comparison of algorithms to infer genetic population structure from unlinked molecular markers
spellingShingle Comparison of algorithms to infer genetic population structure from unlinked molecular markers
Peña Malavera, Andrea Natalia
Cluster Analysis
Multilocus-Biallelic Genotypes
Plant Breeding
Self-Organizing Maps
title_short Comparison of algorithms to infer genetic population structure from unlinked molecular markers
title_full Comparison of algorithms to infer genetic population structure from unlinked molecular markers
title_fullStr Comparison of algorithms to infer genetic population structure from unlinked molecular markers
title_full_unstemmed Comparison of algorithms to infer genetic population structure from unlinked molecular markers
title_sort Comparison of algorithms to infer genetic population structure from unlinked molecular markers
dc.creator.none.fl_str_mv Peña Malavera, Andrea Natalia
Fernandez, Elmer Andres
Bruno, Cecilia Ines
Balzarini, Monica Graciela
author Peña Malavera, Andrea Natalia
author_facet Peña Malavera, Andrea Natalia
Fernandez, Elmer Andres
Bruno, Cecilia Ines
Balzarini, Monica Graciela
author_role author
author2 Fernandez, Elmer Andres
Bruno, Cecilia Ines
Balzarini, Monica Graciela
author2_role author
author
author
dc.subject.none.fl_str_mv Cluster Analysis
Multilocus-Biallelic Genotypes
Plant Breeding
Self-Organizing Maps
topic Cluster Analysis
Multilocus-Biallelic Genotypes
Plant Breeding
Self-Organizing Maps
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high FST) and two numbers of sub-populations (K=3 and K=5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence (FST=0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data.
Fil: Peña Malavera, Andrea Natalia. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Córdoba. Facultad de Ciencias Agropecuarias; Argentina
Fil: Fernandez, Elmer Andres. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Córdoba. Facultad de Ciencias Agropecuarias; Argentina
Fil: Bruno, Cecilia Ines. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Catolica de Córdoba. Facultad de Ingeniería; Argentina
Fil: Balzarini, Monica Graciela. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Córdoba. Facultad de Ciencias Agropecuarias; Argentina
description Identifying population genetic structure (PGS) is crucial for breeding and conservation. Several clustering algorithms are available to identify the underlying PGS to be used with genetic data of maize genotypes. In this work, six methods to identify PGS from unlinked molecular marker data were compared using simulated and experimental data consisting of multilocus-biallelic genotypes. Datasets were delineated under different biological scenarios characterized by three levels of genetic divergence among populations (low, medium, and high FST) and two numbers of sub-populations (K=3 and K=5). The relative performance of hierarchical and non-hierarchical clustering, as well as model-based clustering (STRUCTURE) and clustering from neural networks (SOM-RP-Q). We use the clustering error rate of genotypes into discrete sub-populations as comparison criterion. In scenarios with great level of divergence among genotype groups all methods performed well. With moderate level of genetic divergence (FST=0.2), the algorithms SOM-RP-Q and STRUCTURE performed better than hierarchical and non-hierarchical clustering. In all simulated scenarios with low genetic divergence and in the experimental SNP maize panel (largely unlinked), SOM-RP-Q achieved the lowest clustering error rate. The SOM algorithm used here is more effective than other evaluated methods for sparse unlinked genetic data.
publishDate 2014
dc.date.none.fl_str_mv 2014-06
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/34261
Peña Malavera, Andrea Natalia; Fernandez, Elmer Andres; Bruno, Cecilia Ines; Balzarini, Monica Graciela; Comparison of algorithms to infer genetic population structure from unlinked molecular markers; Berkeley Electronic Press; Statistical Applications In Genetics And Molecular Biology; 13; 4; 6-2014; 391-402
1544-6115
CONICET Digital
CONICET
url http://hdl.handle.net/11336/34261
identifier_str_mv Peña Malavera, Andrea Natalia; Fernandez, Elmer Andres; Bruno, Cecilia Ines; Balzarini, Monica Graciela; Comparison of algorithms to infer genetic population structure from unlinked molecular markers; Berkeley Electronic Press; Statistical Applications In Genetics And Molecular Biology; 13; 4; 6-2014; 391-402
1544-6115
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/10.1515/sagmb-2013-0006
info:eu-repo/semantics/altIdentifier/url/https://www.degruyter.com/view/j/sagmb.2014.13.issue-4/sagmb-2013-0006/sagmb-2013-0006.xml
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv Berkeley Electronic Press
publisher.none.fl_str_mv Berkeley Electronic Press
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1846082622329978880
score 13.22299