Comparing genomic prediction models by means of cross validation
- Autores
- Schrauf, Matías Florián; Campos, Gustavo de los; Munilla Leguizamón, Sebastián
- Año de publicación
- 2021
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Fil: Schrauf, Matías Florián. Universidad de Buenos Aires. Facultad de Agronomía. Buenos Aires, Argentina.
Fil: Schrauf, Matías Florián. Wageningen University and Research. Wageningen Livestock Research. Animal Breeding and Genomics. Wageningen, Países Bajos.
Fil: Campos, Gustavo de los. Michigan State University. Departments of Epidemiology, Biostatistics, Statistics and Probabilty. Institute for Quantitative Health Science and Engineering. East Lansing, MI, Estados Unidos.
Fil: Munilla Leguizamón, Sebastián. Universidad de Buenos Aires. Facultad de Agronomía. Buenos Aires, Argentina.
Fil: Munilla Leguizamón, Sebastián. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Instituto de Investigaciones en Producción Animal (INPA). Buenos Aires, Argentina.
Fil: Munilla Leguizamón, Sebastián. CONICET - Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Instituto de Investigaciones en Producción Animal (INPA). Buenos Aires, Argentina.
In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders.
tbls., grafs. - Fuente
- Frontiers in Plant Science
Vol.12
art. 734512
http://www.frontiersin.org - Materia
-
GENOMIC SELECTION
CROSS VALIDATION
PLANT BREEDING
GENOMIC MODELS
MODEL SELECTION - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- acceso abierto
- Repositorio
- Institución
- Universidad de Buenos Aires. Facultad de Agronomía
- OAI Identificador
- snrd:2021schrauf
Ver los metadatos del registro completo
id |
FAUBA_5f555d43ce61200a4aef39ffd3cf582b |
---|---|
oai_identifier_str |
snrd:2021schrauf |
network_acronym_str |
FAUBA |
repository_id_str |
2729 |
network_name_str |
FAUBA Digital (UBA-FAUBA) |
spelling |
Comparing genomic prediction models by means of cross validationSchrauf, Matías FloriánCampos, Gustavo de losMunilla Leguizamón, SebastiánGENOMIC SELECTIONCROSS VALIDATIONPLANT BREEDINGGENOMIC MODELSMODEL SELECTIONFil: Schrauf, Matías Florián. Universidad de Buenos Aires. Facultad de Agronomía. Buenos Aires, Argentina.Fil: Schrauf, Matías Florián. Wageningen University and Research. Wageningen Livestock Research. Animal Breeding and Genomics. Wageningen, Países Bajos.Fil: Campos, Gustavo de los. Michigan State University. Departments of Epidemiology, Biostatistics, Statistics and Probabilty. Institute for Quantitative Health Science and Engineering. East Lansing, MI, Estados Unidos.Fil: Munilla Leguizamón, Sebastián. Universidad de Buenos Aires. Facultad de Agronomía. Buenos Aires, Argentina.Fil: Munilla Leguizamón, Sebastián. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Instituto de Investigaciones en Producción Animal (INPA). Buenos Aires, Argentina.Fil: Munilla Leguizamón, Sebastián. CONICET - Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Instituto de Investigaciones en Producción Animal (INPA). Buenos Aires, Argentina.In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders.tbls., grafs.2021info:eu-repo/semantics/articlepublishedVersioninfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdf10.3389/fpls.2021.734512http://ri.agro.uba.ar/greenstone3/library/collection/arti/document/2021schraufFrontiers in Plant ScienceVol.12art. 734512http://www.frontiersin.orgreponame:FAUBA Digital (UBA-FAUBA)instname:Universidad de Buenos Aires. Facultad de Agronomíaenginfo:eu-repo/semantics/openAccessopenAccesshttp://ri.agro.uba.ar/greenstone3/library/page/biblioteca#section42025-09-29T13:41:09Zsnrd:2021schraufinstacron:UBA-FAUBAInstitucionalhttp://ri.agro.uba.ar/Universidad públicaNo correspondehttp://ri.agro.uba.ar/greenstone3/oaiserver?verb=ListSetsmartino@agro.uba.ar;berasa@agro.uba.ar ArgentinaNo correspondeNo correspondeNo correspondeopendoar:27292025-09-29 13:41:09.884FAUBA Digital (UBA-FAUBA) - Universidad de Buenos Aires. Facultad de Agronomíafalse |
dc.title.none.fl_str_mv |
Comparing genomic prediction models by means of cross validation |
title |
Comparing genomic prediction models by means of cross validation |
spellingShingle |
Comparing genomic prediction models by means of cross validation Schrauf, Matías Florián GENOMIC SELECTION CROSS VALIDATION PLANT BREEDING GENOMIC MODELS MODEL SELECTION |
title_short |
Comparing genomic prediction models by means of cross validation |
title_full |
Comparing genomic prediction models by means of cross validation |
title_fullStr |
Comparing genomic prediction models by means of cross validation |
title_full_unstemmed |
Comparing genomic prediction models by means of cross validation |
title_sort |
Comparing genomic prediction models by means of cross validation |
dc.creator.none.fl_str_mv |
Schrauf, Matías Florián Campos, Gustavo de los Munilla Leguizamón, Sebastián |
author |
Schrauf, Matías Florián |
author_facet |
Schrauf, Matías Florián Campos, Gustavo de los Munilla Leguizamón, Sebastián |
author_role |
author |
author2 |
Campos, Gustavo de los Munilla Leguizamón, Sebastián |
author2_role |
author author |
dc.subject.none.fl_str_mv |
GENOMIC SELECTION CROSS VALIDATION PLANT BREEDING GENOMIC MODELS MODEL SELECTION |
topic |
GENOMIC SELECTION CROSS VALIDATION PLANT BREEDING GENOMIC MODELS MODEL SELECTION |
dc.description.none.fl_txt_mv |
Fil: Schrauf, Matías Florián. Universidad de Buenos Aires. Facultad de Agronomía. Buenos Aires, Argentina. Fil: Schrauf, Matías Florián. Wageningen University and Research. Wageningen Livestock Research. Animal Breeding and Genomics. Wageningen, Países Bajos. Fil: Campos, Gustavo de los. Michigan State University. Departments of Epidemiology, Biostatistics, Statistics and Probabilty. Institute for Quantitative Health Science and Engineering. East Lansing, MI, Estados Unidos. Fil: Munilla Leguizamón, Sebastián. Universidad de Buenos Aires. Facultad de Agronomía. Buenos Aires, Argentina. Fil: Munilla Leguizamón, Sebastián. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Instituto de Investigaciones en Producción Animal (INPA). Buenos Aires, Argentina. Fil: Munilla Leguizamón, Sebastián. CONICET - Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Instituto de Investigaciones en Producción Animal (INPA). Buenos Aires, Argentina. In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders. tbls., grafs. |
description |
Fil: Schrauf, Matías Florián. Universidad de Buenos Aires. Facultad de Agronomía. Buenos Aires, Argentina. |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article publishedVersion info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
10.3389/fpls.2021.734512 http://ri.agro.uba.ar/greenstone3/library/collection/arti/document/2021schrauf |
identifier_str_mv |
10.3389/fpls.2021.734512 |
url |
http://ri.agro.uba.ar/greenstone3/library/collection/arti/document/2021schrauf |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess openAccess http://ri.agro.uba.ar/greenstone3/library/page/biblioteca#section4 |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
openAccess http://ri.agro.uba.ar/greenstone3/library/page/biblioteca#section4 |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
Frontiers in Plant Science Vol.12 art. 734512 http://www.frontiersin.org reponame:FAUBA Digital (UBA-FAUBA) instname:Universidad de Buenos Aires. Facultad de Agronomía |
reponame_str |
FAUBA Digital (UBA-FAUBA) |
collection |
FAUBA Digital (UBA-FAUBA) |
instname_str |
Universidad de Buenos Aires. Facultad de Agronomía |
repository.name.fl_str_mv |
FAUBA Digital (UBA-FAUBA) - Universidad de Buenos Aires. Facultad de Agronomía |
repository.mail.fl_str_mv |
martino@agro.uba.ar;berasa@agro.uba.ar |
_version_ |
1844618853259149312 |
score |
13.070432 |