Comparing Genomic Prediction Models by Means of Cross Validation
- Autores
- Schrauf, Matías Florián; de los Campos, Gustavo; Munilla Leguizamon, Sebastian
- Año de publicación
- 2021
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders.
Fil: Schrauf, Matías Florián. University of Agriculture Wageningen; Países Bajos. Universidad de Buenos Aires. Facultad de Agronomía; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Unidad Ejecutora de Investigaciones en Producción Animal. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Unidad Ejecutora de Investigaciones en Producción Animal; Argentina
Fil: de los Campos, Gustavo. Michigan State University; Estados Unidos
Fil: Munilla Leguizamon, Sebastian. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Unidad Ejecutora de Investigaciones en Producción Animal. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Unidad Ejecutora de Investigaciones en Producción Animal; Argentina. Universidad de Buenos Aires. Facultad de Agronomía; Argentina - Materia
-
CROSS VALIDATION
GENOMIC MODELS
GENOMIC SELECTION
MODEL SELECTION
PLANT BREEDING - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/211734
Ver los metadatos del registro completo
id |
CONICETDig_af9c213d46c79c9ce0707f1a43e19273 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/211734 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Comparing Genomic Prediction Models by Means of Cross ValidationSchrauf, Matías Floriánde los Campos, GustavoMunilla Leguizamon, SebastianCROSS VALIDATIONGENOMIC MODELSGENOMIC SELECTIONMODEL SELECTIONPLANT BREEDINGhttps://purl.org/becyt/ford/4.1https://purl.org/becyt/ford/4In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders.Fil: Schrauf, Matías Florián. University of Agriculture Wageningen; Países Bajos. Universidad de Buenos Aires. Facultad de Agronomía; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Unidad Ejecutora de Investigaciones en Producción Animal. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Unidad Ejecutora de Investigaciones en Producción Animal; ArgentinaFil: de los Campos, Gustavo. Michigan State University; Estados UnidosFil: Munilla Leguizamon, Sebastian. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Unidad Ejecutora de Investigaciones en Producción Animal. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Unidad Ejecutora de Investigaciones en Producción Animal; Argentina. Universidad de Buenos Aires. Facultad de Agronomía; ArgentinaFrontiers Media2021-11info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/211734Schrauf, Matías Florián; de los Campos, Gustavo; Munilla Leguizamon, Sebastian; Comparing Genomic Prediction Models by Means of Cross Validation; Frontiers Media; Frontiers in Plant Science; 12; 734512; 11-2021; 1-111664-462XCONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.frontiersin.org/articles/10.3389/fpls.2021.734512/fullinfo:eu-repo/semantics/altIdentifier/doi/10.3389/fpls.2021.734512info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:58:29Zoai:ri.conicet.gov.ar:11336/211734instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:58:29.64CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Comparing Genomic Prediction Models by Means of Cross Validation |
title |
Comparing Genomic Prediction Models by Means of Cross Validation |
spellingShingle |
Comparing Genomic Prediction Models by Means of Cross Validation Schrauf, Matías Florián CROSS VALIDATION GENOMIC MODELS GENOMIC SELECTION MODEL SELECTION PLANT BREEDING |
title_short |
Comparing Genomic Prediction Models by Means of Cross Validation |
title_full |
Comparing Genomic Prediction Models by Means of Cross Validation |
title_fullStr |
Comparing Genomic Prediction Models by Means of Cross Validation |
title_full_unstemmed |
Comparing Genomic Prediction Models by Means of Cross Validation |
title_sort |
Comparing Genomic Prediction Models by Means of Cross Validation |
dc.creator.none.fl_str_mv |
Schrauf, Matías Florián de los Campos, Gustavo Munilla Leguizamon, Sebastian |
author |
Schrauf, Matías Florián |
author_facet |
Schrauf, Matías Florián de los Campos, Gustavo Munilla Leguizamon, Sebastian |
author_role |
author |
author2 |
de los Campos, Gustavo Munilla Leguizamon, Sebastian |
author2_role |
author author |
dc.subject.none.fl_str_mv |
CROSS VALIDATION GENOMIC MODELS GENOMIC SELECTION MODEL SELECTION PLANT BREEDING |
topic |
CROSS VALIDATION GENOMIC MODELS GENOMIC SELECTION MODEL SELECTION PLANT BREEDING |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/4.1 https://purl.org/becyt/ford/4 |
dc.description.none.fl_txt_mv |
In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders. Fil: Schrauf, Matías Florián. University of Agriculture Wageningen; Países Bajos. Universidad de Buenos Aires. Facultad de Agronomía; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Unidad Ejecutora de Investigaciones en Producción Animal. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Unidad Ejecutora de Investigaciones en Producción Animal; Argentina Fil: de los Campos, Gustavo. Michigan State University; Estados Unidos Fil: Munilla Leguizamon, Sebastian. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Unidad Ejecutora de Investigaciones en Producción Animal. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Unidad Ejecutora de Investigaciones en Producción Animal; Argentina. Universidad de Buenos Aires. Facultad de Agronomía; Argentina |
description |
In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders. |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021-11 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/211734 Schrauf, Matías Florián; de los Campos, Gustavo; Munilla Leguizamon, Sebastian; Comparing Genomic Prediction Models by Means of Cross Validation; Frontiers Media; Frontiers in Plant Science; 12; 734512; 11-2021; 1-11 1664-462X CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/211734 |
identifier_str_mv |
Schrauf, Matías Florián; de los Campos, Gustavo; Munilla Leguizamon, Sebastian; Comparing Genomic Prediction Models by Means of Cross Validation; Frontiers Media; Frontiers in Plant Science; 12; 734512; 11-2021; 1-11 1664-462X CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://www.frontiersin.org/articles/10.3389/fpls.2021.734512/full info:eu-repo/semantics/altIdentifier/doi/10.3389/fpls.2021.734512 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Frontiers Media |
publisher.none.fl_str_mv |
Frontiers Media |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844613742959001600 |
score |
13.070432 |