Relative performance of cluster algorithms and validation indices in maize genome-wide structure patterns
- Autores
- Videla, María Eugenia; Iglesias, Juliana; Bruno, Cecilia Inés
- Año de publicación
- 2021
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión aceptada
- Descripción
- A number of clustering algorithms are available to depict population genetic structure (PGS) with genomic data; however, there is no consensus on which methods are the best performing ones. We conducted a simulation study of three PGS scenarios with subpopulations k = 2, 5 and 10, recreating several maize genomes as a model to: (1) compare three well-known clustering methods: UPGMA, k-means and, Bayesian method (BM); (2) asses four internal validation indices: CH, Connectivity, Dunn and Silhouette, to determine the reliable number of groups defining a PGS; and (3) estimate the misclassification rate for each validation index. Moreover, a publicly available maize dataset was used to illustrate the outcomes of our simulation. BM was the best method to classify individuals in all tested scenarios, without assignment errors. Conversely, UPGMA was the method with the highest misclassification rate. In scenarios with 5 and 10 subpopulations, CH and Connectivity indices had the maximum underestimation of group number for all cluster algorithms. Dunn and Silhouette indices showed the best performance with BM. Nevertheless, since Silhouette measures the degree of confidence in cluster assignment, and BM measures the probability of cluster membership, these results should be considered with caution. In this study we found that BM showed to be efficient to depict the PGS in both simulated and real maize datasets. This study offers a robust alternative to unveil the existing PGS, thereby facilitating population studies and breeding strategies in maize programs. Moreover, the present findings may have implications for other crop species.
EEA Pergamino
Fil: Videla, María Eugenia. Universidad Nacional de Córdoba. Facultad de Ciencias Agropecuarias. Estadística y Biometría; Argentina
Fil: Videla, María Eugenia. Consejo Nacional de Investigaciones Científicas y Tecnológicas. Unidad de Fitopatología y Modelización Agrícola (UFyMA -CONICET); Argentina
Fil: Videla, María Eugenia. Universidad Nacional de Villa María; Argentina
Fil: Iglesias, Juliana. Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria Pergamino. Departamento de Maíz; Argentina
Fil: Iglesias, Juliana. Universidad Nacional del Noroeste de la Provincia de Buenos Aires. Escuela de Agrarias, Naturales y Ambientales; Argentina
Fil: Bruno, Cecilia. Universidad Nacional de Córdoba. Facultad de Ciencias Agropecuarias. Estadística y Biometría; Argentina
Fil: Bruno, Cecilia. Consejo Nacional de Investigaciones Científicas y Tecnológicas. Unidad de Fitopatología y Modelización Agrícola (UFyMA -CONICET); Argentina - Fuente
- Euphytica 217 (10) : 195 (October 2021)
- Materia
-
Maíz
Genética de Poblaciones
Genomas
Mejoramiento Genético
Maize
Population Genetics
Genomes
Genetic Improvement
Unsupervised Learning
Population Genetic Structure
Multivariate Technique
Outcome Misclassification - Nivel de accesibilidad
- acceso restringido
- Condiciones de uso
- Repositorio
- Institución
- Instituto Nacional de Tecnología Agropecuaria
- OAI Identificador
- oai:localhost:20.500.12123/11153
Ver los metadatos del registro completo
id |
INTADig_cd37f8be32803239259b5f7f8b104373 |
---|---|
oai_identifier_str |
oai:localhost:20.500.12123/11153 |
network_acronym_str |
INTADig |
repository_id_str |
l |
network_name_str |
INTA Digital (INTA) |
spelling |
Relative performance of cluster algorithms and validation indices in maize genome-wide structure patternsVidela, María EugeniaIglesias, JulianaBruno, Cecilia InésMaízGenética de PoblacionesGenomasMejoramiento GenéticoMaizePopulation GeneticsGenomesGenetic ImprovementUnsupervised LearningPopulation Genetic StructureMultivariate TechniqueOutcome MisclassificationA number of clustering algorithms are available to depict population genetic structure (PGS) with genomic data; however, there is no consensus on which methods are the best performing ones. We conducted a simulation study of three PGS scenarios with subpopulations k = 2, 5 and 10, recreating several maize genomes as a model to: (1) compare three well-known clustering methods: UPGMA, k-means and, Bayesian method (BM); (2) asses four internal validation indices: CH, Connectivity, Dunn and Silhouette, to determine the reliable number of groups defining a PGS; and (3) estimate the misclassification rate for each validation index. Moreover, a publicly available maize dataset was used to illustrate the outcomes of our simulation. BM was the best method to classify individuals in all tested scenarios, without assignment errors. Conversely, UPGMA was the method with the highest misclassification rate. In scenarios with 5 and 10 subpopulations, CH and Connectivity indices had the maximum underestimation of group number for all cluster algorithms. Dunn and Silhouette indices showed the best performance with BM. Nevertheless, since Silhouette measures the degree of confidence in cluster assignment, and BM measures the probability of cluster membership, these results should be considered with caution. In this study we found that BM showed to be efficient to depict the PGS in both simulated and real maize datasets. This study offers a robust alternative to unveil the existing PGS, thereby facilitating population studies and breeding strategies in maize programs. Moreover, the present findings may have implications for other crop species.EEA PergaminoFil: Videla, María Eugenia. Universidad Nacional de Córdoba. Facultad de Ciencias Agropecuarias. Estadística y Biometría; ArgentinaFil: Videla, María Eugenia. Consejo Nacional de Investigaciones Científicas y Tecnológicas. Unidad de Fitopatología y Modelización Agrícola (UFyMA -CONICET); ArgentinaFil: Videla, María Eugenia. Universidad Nacional de Villa María; ArgentinaFil: Iglesias, Juliana. Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria Pergamino. Departamento de Maíz; ArgentinaFil: Iglesias, Juliana. Universidad Nacional del Noroeste de la Provincia de Buenos Aires. Escuela de Agrarias, Naturales y Ambientales; ArgentinaFil: Bruno, Cecilia. Universidad Nacional de Córdoba. Facultad de Ciencias Agropecuarias. Estadística y Biometría; ArgentinaFil: Bruno, Cecilia. Consejo Nacional de Investigaciones Científicas y Tecnológicas. Unidad de Fitopatología y Modelización Agrícola (UFyMA -CONICET); ArgentinaSpringer Nature2022-02-15T14:34:44Z2022-02-15T14:34:44Z2021-09info:eu-repo/semantics/articleinfo:eu-repo/semantics/acceptedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfhttp://hdl.handle.net/20.500.12123/11153https://link.springer.com/article/10.1007/s10681-021-02926-51573-5060 (online)0014-2336https://doi.org/10.1007/s10681-021-02926-5Euphytica 217 (10) : 195 (October 2021)reponame:INTA Digital (INTA)instname:Instituto Nacional de Tecnología Agropecuariaenginfo:eu-repograntAgreement/INTA/2019-PE-E6-I114-001/2019-PE-E6-I114-001/AR./Caracterización de la diversidad genética de plantas, animales y microorganismos mediante herramientas de genómica aplicada.info:eu-repo/semantics/restrictedAccess2025-10-16T09:30:23Zoai:localhost:20.500.12123/11153instacron:INTAInstitucionalhttp://repositorio.inta.gob.ar/Organismo científico-tecnológicoNo correspondehttp://repositorio.inta.gob.ar/oai/requesttripaldi.nicolas@inta.gob.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:l2025-10-16 09:30:24.074INTA Digital (INTA) - Instituto Nacional de Tecnología Agropecuariafalse |
dc.title.none.fl_str_mv |
Relative performance of cluster algorithms and validation indices in maize genome-wide structure patterns |
title |
Relative performance of cluster algorithms and validation indices in maize genome-wide structure patterns |
spellingShingle |
Relative performance of cluster algorithms and validation indices in maize genome-wide structure patterns Videla, María Eugenia Maíz Genética de Poblaciones Genomas Mejoramiento Genético Maize Population Genetics Genomes Genetic Improvement Unsupervised Learning Population Genetic Structure Multivariate Technique Outcome Misclassification |
title_short |
Relative performance of cluster algorithms and validation indices in maize genome-wide structure patterns |
title_full |
Relative performance of cluster algorithms and validation indices in maize genome-wide structure patterns |
title_fullStr |
Relative performance of cluster algorithms and validation indices in maize genome-wide structure patterns |
title_full_unstemmed |
Relative performance of cluster algorithms and validation indices in maize genome-wide structure patterns |
title_sort |
Relative performance of cluster algorithms and validation indices in maize genome-wide structure patterns |
dc.creator.none.fl_str_mv |
Videla, María Eugenia Iglesias, Juliana Bruno, Cecilia Inés |
author |
Videla, María Eugenia |
author_facet |
Videla, María Eugenia Iglesias, Juliana Bruno, Cecilia Inés |
author_role |
author |
author2 |
Iglesias, Juliana Bruno, Cecilia Inés |
author2_role |
author author |
dc.subject.none.fl_str_mv |
Maíz Genética de Poblaciones Genomas Mejoramiento Genético Maize Population Genetics Genomes Genetic Improvement Unsupervised Learning Population Genetic Structure Multivariate Technique Outcome Misclassification |
topic |
Maíz Genética de Poblaciones Genomas Mejoramiento Genético Maize Population Genetics Genomes Genetic Improvement Unsupervised Learning Population Genetic Structure Multivariate Technique Outcome Misclassification |
dc.description.none.fl_txt_mv |
A number of clustering algorithms are available to depict population genetic structure (PGS) with genomic data; however, there is no consensus on which methods are the best performing ones. We conducted a simulation study of three PGS scenarios with subpopulations k = 2, 5 and 10, recreating several maize genomes as a model to: (1) compare three well-known clustering methods: UPGMA, k-means and, Bayesian method (BM); (2) asses four internal validation indices: CH, Connectivity, Dunn and Silhouette, to determine the reliable number of groups defining a PGS; and (3) estimate the misclassification rate for each validation index. Moreover, a publicly available maize dataset was used to illustrate the outcomes of our simulation. BM was the best method to classify individuals in all tested scenarios, without assignment errors. Conversely, UPGMA was the method with the highest misclassification rate. In scenarios with 5 and 10 subpopulations, CH and Connectivity indices had the maximum underestimation of group number for all cluster algorithms. Dunn and Silhouette indices showed the best performance with BM. Nevertheless, since Silhouette measures the degree of confidence in cluster assignment, and BM measures the probability of cluster membership, these results should be considered with caution. In this study we found that BM showed to be efficient to depict the PGS in both simulated and real maize datasets. This study offers a robust alternative to unveil the existing PGS, thereby facilitating population studies and breeding strategies in maize programs. Moreover, the present findings may have implications for other crop species. EEA Pergamino Fil: Videla, María Eugenia. Universidad Nacional de Córdoba. Facultad de Ciencias Agropecuarias. Estadística y Biometría; Argentina Fil: Videla, María Eugenia. Consejo Nacional de Investigaciones Científicas y Tecnológicas. Unidad de Fitopatología y Modelización Agrícola (UFyMA -CONICET); Argentina Fil: Videla, María Eugenia. Universidad Nacional de Villa María; Argentina Fil: Iglesias, Juliana. Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria Pergamino. Departamento de Maíz; Argentina Fil: Iglesias, Juliana. Universidad Nacional del Noroeste de la Provincia de Buenos Aires. Escuela de Agrarias, Naturales y Ambientales; Argentina Fil: Bruno, Cecilia. Universidad Nacional de Córdoba. Facultad de Ciencias Agropecuarias. Estadística y Biometría; Argentina Fil: Bruno, Cecilia. Consejo Nacional de Investigaciones Científicas y Tecnológicas. Unidad de Fitopatología y Modelización Agrícola (UFyMA -CONICET); Argentina |
description |
A number of clustering algorithms are available to depict population genetic structure (PGS) with genomic data; however, there is no consensus on which methods are the best performing ones. We conducted a simulation study of three PGS scenarios with subpopulations k = 2, 5 and 10, recreating several maize genomes as a model to: (1) compare three well-known clustering methods: UPGMA, k-means and, Bayesian method (BM); (2) asses four internal validation indices: CH, Connectivity, Dunn and Silhouette, to determine the reliable number of groups defining a PGS; and (3) estimate the misclassification rate for each validation index. Moreover, a publicly available maize dataset was used to illustrate the outcomes of our simulation. BM was the best method to classify individuals in all tested scenarios, without assignment errors. Conversely, UPGMA was the method with the highest misclassification rate. In scenarios with 5 and 10 subpopulations, CH and Connectivity indices had the maximum underestimation of group number for all cluster algorithms. Dunn and Silhouette indices showed the best performance with BM. Nevertheless, since Silhouette measures the degree of confidence in cluster assignment, and BM measures the probability of cluster membership, these results should be considered with caution. In this study we found that BM showed to be efficient to depict the PGS in both simulated and real maize datasets. This study offers a robust alternative to unveil the existing PGS, thereby facilitating population studies and breeding strategies in maize programs. Moreover, the present findings may have implications for other crop species. |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021-09 2022-02-15T14:34:44Z 2022-02-15T14:34:44Z |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/acceptedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
acceptedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/20.500.12123/11153 https://link.springer.com/article/10.1007/s10681-021-02926-5 1573-5060 (online) 0014-2336 https://doi.org/10.1007/s10681-021-02926-5 |
url |
http://hdl.handle.net/20.500.12123/11153 https://link.springer.com/article/10.1007/s10681-021-02926-5 https://doi.org/10.1007/s10681-021-02926-5 |
identifier_str_mv |
1573-5060 (online) 0014-2336 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repograntAgreement/INTA/2019-PE-E6-I114-001/2019-PE-E6-I114-001/AR./Caracterización de la diversidad genética de plantas, animales y microorganismos mediante herramientas de genómica aplicada. |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/restrictedAccess |
eu_rights_str_mv |
restrictedAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Springer Nature |
publisher.none.fl_str_mv |
Springer Nature |
dc.source.none.fl_str_mv |
Euphytica 217 (10) : 195 (October 2021) reponame:INTA Digital (INTA) instname:Instituto Nacional de Tecnología Agropecuaria |
reponame_str |
INTA Digital (INTA) |
collection |
INTA Digital (INTA) |
instname_str |
Instituto Nacional de Tecnología Agropecuaria |
repository.name.fl_str_mv |
INTA Digital (INTA) - Instituto Nacional de Tecnología Agropecuaria |
repository.mail.fl_str_mv |
tripaldi.nicolas@inta.gob.ar |
_version_ |
1846143543432708096 |
score |
12.712165 |