Comparing Genomic Prediction Models by Means of Cross Validation

Autores
Schrauf, Matías Florián; de los Campos, Gustavo; Munilla Leguizamon, Sebastian
Año de publicación
2021
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders.
Fil: Schrauf, Matías Florián. University of Agriculture Wageningen; Países Bajos. Universidad de Buenos Aires. Facultad de Agronomía; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Unidad Ejecutora de Investigaciones en Producción Animal. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Unidad Ejecutora de Investigaciones en Producción Animal; Argentina
Fil: de los Campos, Gustavo. Michigan State University; Estados Unidos
Fil: Munilla Leguizamon, Sebastian. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Unidad Ejecutora de Investigaciones en Producción Animal. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Unidad Ejecutora de Investigaciones en Producción Animal; Argentina. Universidad de Buenos Aires. Facultad de Agronomía; Argentina
Materia
CROSS VALIDATION
GENOMIC MODELS
GENOMIC SELECTION
MODEL SELECTION
PLANT BREEDING
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/211734

id CONICETDig_af9c213d46c79c9ce0707f1a43e19273
oai_identifier_str oai:ri.conicet.gov.ar:11336/211734
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Comparing Genomic Prediction Models by Means of Cross ValidationSchrauf, Matías Floriánde los Campos, GustavoMunilla Leguizamon, SebastianCROSS VALIDATIONGENOMIC MODELSGENOMIC SELECTIONMODEL SELECTIONPLANT BREEDINGhttps://purl.org/becyt/ford/4.1https://purl.org/becyt/ford/4In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders.Fil: Schrauf, Matías Florián. University of Agriculture Wageningen; Países Bajos. Universidad de Buenos Aires. Facultad de Agronomía; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Unidad Ejecutora de Investigaciones en Producción Animal. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Unidad Ejecutora de Investigaciones en Producción Animal; ArgentinaFil: de los Campos, Gustavo. Michigan State University; Estados UnidosFil: Munilla Leguizamon, Sebastian. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Unidad Ejecutora de Investigaciones en Producción Animal. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Unidad Ejecutora de Investigaciones en Producción Animal; Argentina. Universidad de Buenos Aires. Facultad de Agronomía; ArgentinaFrontiers Media2021-11info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/211734Schrauf, Matías Florián; de los Campos, Gustavo; Munilla Leguizamon, Sebastian; Comparing Genomic Prediction Models by Means of Cross Validation; Frontiers Media; Frontiers in Plant Science; 12; 734512; 11-2021; 1-111664-462XCONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.frontiersin.org/articles/10.3389/fpls.2021.734512/fullinfo:eu-repo/semantics/altIdentifier/doi/10.3389/fpls.2021.734512info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:58:29Zoai:ri.conicet.gov.ar:11336/211734instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:58:29.64CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Comparing Genomic Prediction Models by Means of Cross Validation
title Comparing Genomic Prediction Models by Means of Cross Validation
spellingShingle Comparing Genomic Prediction Models by Means of Cross Validation
Schrauf, Matías Florián
CROSS VALIDATION
GENOMIC MODELS
GENOMIC SELECTION
MODEL SELECTION
PLANT BREEDING
title_short Comparing Genomic Prediction Models by Means of Cross Validation
title_full Comparing Genomic Prediction Models by Means of Cross Validation
title_fullStr Comparing Genomic Prediction Models by Means of Cross Validation
title_full_unstemmed Comparing Genomic Prediction Models by Means of Cross Validation
title_sort Comparing Genomic Prediction Models by Means of Cross Validation
dc.creator.none.fl_str_mv Schrauf, Matías Florián
de los Campos, Gustavo
Munilla Leguizamon, Sebastian
author Schrauf, Matías Florián
author_facet Schrauf, Matías Florián
de los Campos, Gustavo
Munilla Leguizamon, Sebastian
author_role author
author2 de los Campos, Gustavo
Munilla Leguizamon, Sebastian
author2_role author
author
dc.subject.none.fl_str_mv CROSS VALIDATION
GENOMIC MODELS
GENOMIC SELECTION
MODEL SELECTION
PLANT BREEDING
topic CROSS VALIDATION
GENOMIC MODELS
GENOMIC SELECTION
MODEL SELECTION
PLANT BREEDING
purl_subject.fl_str_mv https://purl.org/becyt/ford/4.1
https://purl.org/becyt/ford/4
dc.description.none.fl_txt_mv In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders.
Fil: Schrauf, Matías Florián. University of Agriculture Wageningen; Países Bajos. Universidad de Buenos Aires. Facultad de Agronomía; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Unidad Ejecutora de Investigaciones en Producción Animal. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Unidad Ejecutora de Investigaciones en Producción Animal; Argentina
Fil: de los Campos, Gustavo. Michigan State University; Estados Unidos
Fil: Munilla Leguizamon, Sebastian. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Unidad Ejecutora de Investigaciones en Producción Animal. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Unidad Ejecutora de Investigaciones en Producción Animal; Argentina. Universidad de Buenos Aires. Facultad de Agronomía; Argentina
description In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders.
publishDate 2021
dc.date.none.fl_str_mv 2021-11
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/211734
Schrauf, Matías Florián; de los Campos, Gustavo; Munilla Leguizamon, Sebastian; Comparing Genomic Prediction Models by Means of Cross Validation; Frontiers Media; Frontiers in Plant Science; 12; 734512; 11-2021; 1-11
1664-462X
CONICET Digital
CONICET
url http://hdl.handle.net/11336/211734
identifier_str_mv Schrauf, Matías Florián; de los Campos, Gustavo; Munilla Leguizamon, Sebastian; Comparing Genomic Prediction Models by Means of Cross Validation; Frontiers Media; Frontiers in Plant Science; 12; 734512; 11-2021; 1-11
1664-462X
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://www.frontiersin.org/articles/10.3389/fpls.2021.734512/full
info:eu-repo/semantics/altIdentifier/doi/10.3389/fpls.2021.734512
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Frontiers Media
publisher.none.fl_str_mv Frontiers Media
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844613742959001600
score 13.070432