Comparing genomic prediction models by means of cross validation

Autores
Schrauf, Matías Florián; Campos, Gustavo de los; Munilla Leguizamón, Sebastián
Año de publicación
2021
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Fil: Schrauf, Matías Florián. Universidad de Buenos Aires. Facultad de Agronomía. Buenos Aires, Argentina.
Fil: Schrauf, Matías Florián. Wageningen University and Research. Wageningen Livestock Research. Animal Breeding and Genomics. Wageningen, Países Bajos.
Fil: Campos, Gustavo de los. Michigan State University. Departments of Epidemiology, Biostatistics, Statistics and Probabilty. Institute for Quantitative Health Science and Engineering. East Lansing, MI, Estados Unidos.
Fil: Munilla Leguizamón, Sebastián. Universidad de Buenos Aires. Facultad de Agronomía. Buenos Aires, Argentina.
Fil: Munilla Leguizamón, Sebastián. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Instituto de Investigaciones en Producción Animal (INPA). Buenos Aires, Argentina.
Fil: Munilla Leguizamón, Sebastián. CONICET - Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Instituto de Investigaciones en Producción Animal (INPA). Buenos Aires, Argentina.
In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders.
tbls., grafs.
Fuente
Frontiers in Plant Science
Vol.12
art. 734512
http://www.frontiersin.org
Materia
GENOMIC SELECTION
CROSS VALIDATION
PLANT BREEDING
GENOMIC MODELS
MODEL SELECTION
Nivel de accesibilidad
acceso abierto
Condiciones de uso
acceso abierto
Repositorio
FAUBA Digital (UBA-FAUBA)
Institución
Universidad de Buenos Aires. Facultad de Agronomía
OAI Identificador
snrd:2021schrauf

id FAUBA_5f555d43ce61200a4aef39ffd3cf582b
oai_identifier_str snrd:2021schrauf
network_acronym_str FAUBA
repository_id_str 2729
network_name_str FAUBA Digital (UBA-FAUBA)
spelling Comparing genomic prediction models by means of cross validationSchrauf, Matías FloriánCampos, Gustavo de losMunilla Leguizamón, SebastiánGENOMIC SELECTIONCROSS VALIDATIONPLANT BREEDINGGENOMIC MODELSMODEL SELECTIONFil: Schrauf, Matías Florián. Universidad de Buenos Aires. Facultad de Agronomía. Buenos Aires, Argentina.Fil: Schrauf, Matías Florián. Wageningen University and Research. Wageningen Livestock Research. Animal Breeding and Genomics. Wageningen, Países Bajos.Fil: Campos, Gustavo de los. Michigan State University. Departments of Epidemiology, Biostatistics, Statistics and Probabilty. Institute for Quantitative Health Science and Engineering. East Lansing, MI, Estados Unidos.Fil: Munilla Leguizamón, Sebastián. Universidad de Buenos Aires. Facultad de Agronomía. Buenos Aires, Argentina.Fil: Munilla Leguizamón, Sebastián. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Instituto de Investigaciones en Producción Animal (INPA). Buenos Aires, Argentina.Fil: Munilla Leguizamón, Sebastián. CONICET - Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Instituto de Investigaciones en Producción Animal (INPA). Buenos Aires, Argentina.In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders.tbls., grafs.2021info:eu-repo/semantics/articlepublishedVersioninfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdf10.3389/fpls.2021.734512http://ri.agro.uba.ar/greenstone3/library/collection/arti/document/2021schraufFrontiers in Plant ScienceVol.12art. 734512http://www.frontiersin.orgreponame:FAUBA Digital (UBA-FAUBA)instname:Universidad de Buenos Aires. Facultad de Agronomíaenginfo:eu-repo/semantics/openAccessopenAccesshttp://ri.agro.uba.ar/greenstone3/library/page/biblioteca#section42025-09-29T13:41:09Zsnrd:2021schraufinstacron:UBA-FAUBAInstitucionalhttp://ri.agro.uba.ar/Universidad públicaNo correspondehttp://ri.agro.uba.ar/greenstone3/oaiserver?verb=ListSetsmartino@agro.uba.ar;berasa@agro.uba.ar ArgentinaNo correspondeNo correspondeNo correspondeopendoar:27292025-09-29 13:41:09.884FAUBA Digital (UBA-FAUBA) - Universidad de Buenos Aires. Facultad de Agronomíafalse
dc.title.none.fl_str_mv Comparing genomic prediction models by means of cross validation
title Comparing genomic prediction models by means of cross validation
spellingShingle Comparing genomic prediction models by means of cross validation
Schrauf, Matías Florián
GENOMIC SELECTION
CROSS VALIDATION
PLANT BREEDING
GENOMIC MODELS
MODEL SELECTION
title_short Comparing genomic prediction models by means of cross validation
title_full Comparing genomic prediction models by means of cross validation
title_fullStr Comparing genomic prediction models by means of cross validation
title_full_unstemmed Comparing genomic prediction models by means of cross validation
title_sort Comparing genomic prediction models by means of cross validation
dc.creator.none.fl_str_mv Schrauf, Matías Florián
Campos, Gustavo de los
Munilla Leguizamón, Sebastián
author Schrauf, Matías Florián
author_facet Schrauf, Matías Florián
Campos, Gustavo de los
Munilla Leguizamón, Sebastián
author_role author
author2 Campos, Gustavo de los
Munilla Leguizamón, Sebastián
author2_role author
author
dc.subject.none.fl_str_mv GENOMIC SELECTION
CROSS VALIDATION
PLANT BREEDING
GENOMIC MODELS
MODEL SELECTION
topic GENOMIC SELECTION
CROSS VALIDATION
PLANT BREEDING
GENOMIC MODELS
MODEL SELECTION
dc.description.none.fl_txt_mv Fil: Schrauf, Matías Florián. Universidad de Buenos Aires. Facultad de Agronomía. Buenos Aires, Argentina.
Fil: Schrauf, Matías Florián. Wageningen University and Research. Wageningen Livestock Research. Animal Breeding and Genomics. Wageningen, Países Bajos.
Fil: Campos, Gustavo de los. Michigan State University. Departments of Epidemiology, Biostatistics, Statistics and Probabilty. Institute for Quantitative Health Science and Engineering. East Lansing, MI, Estados Unidos.
Fil: Munilla Leguizamón, Sebastián. Universidad de Buenos Aires. Facultad de Agronomía. Buenos Aires, Argentina.
Fil: Munilla Leguizamón, Sebastián. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Instituto de Investigaciones en Producción Animal (INPA). Buenos Aires, Argentina.
Fil: Munilla Leguizamón, Sebastián. CONICET - Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Instituto de Investigaciones en Producción Animal (INPA). Buenos Aires, Argentina.
In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders.
tbls., grafs.
description Fil: Schrauf, Matías Florián. Universidad de Buenos Aires. Facultad de Agronomía. Buenos Aires, Argentina.
publishDate 2021
dc.date.none.fl_str_mv 2021
dc.type.none.fl_str_mv info:eu-repo/semantics/article
publishedVersion
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv 10.3389/fpls.2021.734512
http://ri.agro.uba.ar/greenstone3/library/collection/arti/document/2021schrauf
identifier_str_mv 10.3389/fpls.2021.734512
url http://ri.agro.uba.ar/greenstone3/library/collection/arti/document/2021schrauf
dc.language.none.fl_str_mv eng
language eng
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
openAccess
http://ri.agro.uba.ar/greenstone3/library/page/biblioteca#section4
eu_rights_str_mv openAccess
rights_invalid_str_mv openAccess
http://ri.agro.uba.ar/greenstone3/library/page/biblioteca#section4
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv Frontiers in Plant Science
Vol.12
art. 734512
http://www.frontiersin.org
reponame:FAUBA Digital (UBA-FAUBA)
instname:Universidad de Buenos Aires. Facultad de Agronomía
reponame_str FAUBA Digital (UBA-FAUBA)
collection FAUBA Digital (UBA-FAUBA)
instname_str Universidad de Buenos Aires. Facultad de Agronomía
repository.name.fl_str_mv FAUBA Digital (UBA-FAUBA) - Universidad de Buenos Aires. Facultad de Agronomía
repository.mail.fl_str_mv martino@agro.uba.ar;berasa@agro.uba.ar
_version_ 1844618853259149312
score 13.070432