Comparing Genomic Prediction Models by Means of Cross Validation

Autores: Schrauf, Matías Florián; de los Campos, Gustavo; Munilla Leguizamon, Sebastian
Año de publicación: 2021
Idioma: inglés
Tipo de recurso: artículo
Estado: versión publicada
Descripción: In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders.
Fil: Schrauf, Matías Florián. University of Agriculture Wageningen; Países Bajos. Universidad de Buenos Aires. Facultad de Agronomía; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Unidad Ejecutora de Investigaciones en Producción Animal. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Unidad Ejecutora de Investigaciones en Producción Animal; Argentina
Fil: de los Campos, Gustavo. Michigan State University; Estados Unidos
Fil: Munilla Leguizamon, Sebastian. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Unidad Ejecutora de Investigaciones en Producción Animal. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Unidad Ejecutora de Investigaciones en Producción Animal; Argentina. Universidad de Buenos Aires. Facultad de Agronomía; Argentina
Materia: CROSS VALIDATION
GENOMIC MODELS
GENOMIC SELECTION
MODEL SELECTION
PLANT BREEDING
Nivel de accesibilidad: acceso abierto
Condiciones de uso: https://creativecommons.org/licenses/by/2.5/ar/
Repositorio
Institución: Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador: oai:ri.conicet.gov.ar:11336/211734

Acceder

id	CONICETDig_af9c213d46c79c9ce0707f1a43e19273
oai_identifier_str	oai:ri.conicet.gov.ar:11336/211734
network_acronym_str	CONICETDig
repository_id_str	3498
network_name_str	CONICET Digital (CONICET)
spelling	Comparing Genomic Prediction Models by Means of Cross ValidationSchrauf, Matías Floriánde los Campos, GustavoMunilla Leguizamon, SebastianCROSS VALIDATIONGENOMIC MODELSGENOMIC SELECTIONMODEL SELECTIONPLANT BREEDINGhttps://purl.org/becyt/ford/4.1https://purl.org/becyt/ford/4In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders.Fil: Schrauf, Matías Florián. University of Agriculture Wageningen; Países Bajos. Universidad de Buenos Aires. Facultad de Agronomía; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Unidad Ejecutora de Investigaciones en Producción Animal. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Unidad Ejecutora de Investigaciones en Producción Animal; ArgentinaFil: de los Campos, Gustavo. Michigan State University; Estados UnidosFil: Munilla Leguizamon, Sebastian. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Unidad Ejecutora de Investigaciones en Producción Animal. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Unidad Ejecutora de Investigaciones en Producción Animal; Argentina. Universidad de Buenos Aires. Facultad de Agronomía; ArgentinaFrontiers Media2021-11info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/211734Schrauf, Matías Florián; de los Campos, Gustavo; Munilla Leguizamon, Sebastian; Comparing Genomic Prediction Models by Means of Cross Validation; Frontiers Media; Frontiers in Plant Science; 12; 734512; 11-2021; 1-111664-462XCONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.frontiersin.org/articles/10.3389/fpls.2021.734512/fullinfo:eu-repo/semantics/altIdentifier/doi/10.3389/fpls.2021.734512info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2026-02-26T10:10:35Zoai:ri.conicet.gov.ar:11336/211734instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982026-02-26 10:10:36.026CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv	Comparing Genomic Prediction Models by Means of Cross Validation
title	Comparing Genomic Prediction Models by Means of Cross Validation
spellingShingle	Comparing Genomic Prediction Models by Means of Cross Validation Schrauf, Matías Florián CROSS VALIDATION GENOMIC MODELS GENOMIC SELECTION MODEL SELECTION PLANT BREEDING
title_short	Comparing Genomic Prediction Models by Means of Cross Validation
title_full	Comparing Genomic Prediction Models by Means of Cross Validation
title_fullStr	Comparing Genomic Prediction Models by Means of Cross Validation
title_full_unstemmed	Comparing Genomic Prediction Models by Means of Cross Validation
title_sort	Comparing Genomic Prediction Models by Means of Cross Validation
dc.creator.none.fl_str_mv	Schrauf, Matías Florián de los Campos, Gustavo Munilla Leguizamon, Sebastian
author	Schrauf, Matías Florián
author_facet	Schrauf, Matías Florián de los Campos, Gustavo Munilla Leguizamon, Sebastian
author_role	author
author2	de los Campos, Gustavo Munilla Leguizamon, Sebastian
author2_role	author author
dc.subject.none.fl_str_mv	CROSS VALIDATION GENOMIC MODELS GENOMIC SELECTION MODEL SELECTION PLANT BREEDING
topic	CROSS VALIDATION GENOMIC MODELS GENOMIC SELECTION MODEL SELECTION PLANT BREEDING
purl_subject.fl_str_mv	https://purl.org/becyt/ford/4.1 https://purl.org/becyt/ford/4
dc.description.none.fl_txt_mv	In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders. Fil: Schrauf, Matías Florián. University of Agriculture Wageningen; Países Bajos. Universidad de Buenos Aires. Facultad de Agronomía; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Unidad Ejecutora de Investigaciones en Producción Animal. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Unidad Ejecutora de Investigaciones en Producción Animal; Argentina Fil: de los Campos, Gustavo. Michigan State University; Estados Unidos Fil: Munilla Leguizamon, Sebastian. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Unidad Ejecutora de Investigaciones en Producción Animal. Universidad de Buenos Aires. Facultad de Ciencias Veterinarias. Unidad Ejecutora de Investigaciones en Producción Animal; Argentina. Universidad de Buenos Aires. Facultad de Agronomía; Argentina
description	In the two decades of continuous development of genomic selection, a great variety of models have been proposed to make predictions from the information available in dense marker panels. Besides deciding which particular model to use, practitioners also need to make many minor choices for those parameters in the model which are not typically estimated by the data (so called “hyper-parameters”). When the focus is placed on predictions, most of these decisions are made in a direction sought to optimize predictive accuracy. Here we discuss and illustrate using publicly available crop datasets the use of cross validation to make many such decisions. In particular, we emphasize the importance of paired comparisons to achieve high power in the comparison between candidate models, as well as the need to define notions of relevance in the difference between their performances. Regarding the latter, we borrow the idea of equivalence margins from clinical research and introduce new statistical tests. We conclude that most hyper-parameters can be learnt from the data by either minimizing REML or by using weakly-informative priors, with good predictive results. In particular, the default options in a popular software are generally competitive with the optimal values. With regard to the performance assessments themselves, we conclude that the paired k-fold cross validation is a generally applicable and statistically powerful methodology to assess differences in model accuracies. Coupled with the definition of equivalence margins based on expected genetic gain, it becomes a useful tool for breeders.
publishDate	2021
dc.date.none.fl_str_mv	2021-11
dc.type.none.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo
format	article
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://hdl.handle.net/11336/211734 Schrauf, Matías Florián; de los Campos, Gustavo; Munilla Leguizamon, Sebastian; Comparing Genomic Prediction Models by Means of Cross Validation; Frontiers Media; Frontiers in Plant Science; 12; 734512; 11-2021; 1-11 1664-462X CONICET Digital CONICET
url	http://hdl.handle.net/11336/211734
identifier_str_mv	Schrauf, Matías Florián; de los Campos, Gustavo; Munilla Leguizamon, Sebastian; Comparing Genomic Prediction Models by Means of Cross Validation; Frontiers Media; Frontiers in Plant Science; 12; 734512; 11-2021; 1-11 1664-462X CONICET Digital CONICET
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/https://www.frontiersin.org/articles/10.3389/fpls.2021.734512/full info:eu-repo/semantics/altIdentifier/doi/10.3389/fpls.2021.734512
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by/2.5/ar/
eu_rights_str_mv	openAccess
rights_invalid_str_mv	https://creativecommons.org/licenses/by/2.5/ar/
dc.format.none.fl_str_mv	application/pdf application/pdf
dc.publisher.none.fl_str_mv	Frontiers Media
publisher.none.fl_str_mv	Frontiers Media
dc.source.none.fl_str_mv	reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str	CONICET Digital (CONICET)
collection	CONICET Digital (CONICET)
instname_str	Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv	CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv	dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_	1858305431679533056
score	13.176822

Comparing Genomic Prediction Models by Means of Cross Validation

Publicaciones similares