Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms

Autores
Raschia, Maria Agustina; Ríos, Pablo Javier; Maizon, Daniel Omar; Demitrio, Daniel Arturo; Poli, Mario Andres
Año de publicación
2022
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Machine learning methods were considered efficient in identifying single nucleotide polymorphisms (SNP) underlying a trait of interest. This study aimed to construct predictive models using machine learning algorithms, to identify loci that best explain the variance in milk traits of dairy cattle. Further objectives involved validating the results by comparison with reported relevant regions and retrieving the pathways overrepresented by the genes flanking relevant SNPs. Regression models using XGBoost (XGB), LightGBM (LGB), and Random Forest (RF) algorithms were trained using estimated breeding values for milk production (EBVM), milk fat content (EBVF) and milk protein content (EBVP) as phenotypes and genotypes on 40417 SNPs as predictor variables. To evaluate their efficiency, metrics for actual vs. predicted values were determined in validation folds (XGB and LGB) and out-of-bag data (RF). Less than 4500 relevant SNPs were retrieved for each trait. Among the genes flanking them, signaling and transmembrane transporter activities were overrepresented. The models trained: •Predicted breeding values for animals not included in the dataset. •Were efficient in identifying a subset of SNPs explaining phenotypic variation. The results obtained using XGB and LGB algorithms agreed with previous results. Therefore, the method proposed could be applied for future association studies on milk traits.
Instituto de Genética
Fil: Raschia, Maria Agustina. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Genética; Argentina
Fil: Ríos, Pablo J. Universidad de Buenos Aires; Argentina
Fil: Ríos, Pablo J. Universidad Nacional de La Plata. Facultad de Ciencias Exactas; Argentina
Fil: Maizon, Daniel Omar. Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria Anguil; Argentina
Fil: Maizon, Daniel Omar. Universidad Nacional de La Pampa. Facultad de Agronomía; Argentina
Fil: Demitrio, Daniel Arturo. Instituto Nacional de Tecnología Agropecuaria (INTA). Dirección General de Sistemas de Información, Comunicación y Procesos. Gerencia de Informática y Gestión de la Información; Argentina
Fil: Demitrio, Daniel Arturo. Universidad Nacional de La Plata. Facultad de Ciencias Exactas; Argentina
Fil: Poli, Mario Andres. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Genética; Argentina
Fil: Poli, Mario Andres. Universidad del Salvador. Facultad de Ciencias Agrarias y Veterinaria; Argentina
Fuente
MethodsX 9 : 101733 (2022)
Materia
Single Nucleotide Polymorphism
Dairy Cattle
Milk Production
Milk Protein
Bioinformatics
Loci
Polimorfismo de un Solo Nucleótidos
Ganado de Leche
Producción Lechera
Proteínas de la Leche
Bioinformática
Milk Fat Content
Machine Learning Algorithms
Contenido de Grasa Láctea
Algoritmos de Aprendizaje Automático
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
INTA Digital (INTA)
Institución
Instituto Nacional de Tecnología Agropecuaria
OAI Identificador
oai:localhost:20.500.12123/11954

id INTADig_27c0c6a34410af9ff59e5eb617add6ab
oai_identifier_str oai:localhost:20.500.12123/11954
network_acronym_str INTADig
repository_id_str l
network_name_str INTA Digital (INTA)
spelling Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithmsRaschia, Maria AgustinaRíos, Pablo JavierMaizon, Daniel OmarDemitrio, Daniel ArturoPoli, Mario AndresSingle Nucleotide PolymorphismDairy CattleMilk ProductionMilk ProteinBioinformaticsLociPolimorfismo de un Solo NucleótidosGanado de LecheProducción LecheraProteínas de la LecheBioinformáticaMilk Fat ContentMachine Learning AlgorithmsContenido de Grasa LácteaAlgoritmos de Aprendizaje AutomáticoMachine learning methods were considered efficient in identifying single nucleotide polymorphisms (SNP) underlying a trait of interest. This study aimed to construct predictive models using machine learning algorithms, to identify loci that best explain the variance in milk traits of dairy cattle. Further objectives involved validating the results by comparison with reported relevant regions and retrieving the pathways overrepresented by the genes flanking relevant SNPs. Regression models using XGBoost (XGB), LightGBM (LGB), and Random Forest (RF) algorithms were trained using estimated breeding values for milk production (EBVM), milk fat content (EBVF) and milk protein content (EBVP) as phenotypes and genotypes on 40417 SNPs as predictor variables. To evaluate their efficiency, metrics for actual vs. predicted values were determined in validation folds (XGB and LGB) and out-of-bag data (RF). Less than 4500 relevant SNPs were retrieved for each trait. Among the genes flanking them, signaling and transmembrane transporter activities were overrepresented. The models trained: •Predicted breeding values for animals not included in the dataset. •Were efficient in identifying a subset of SNPs explaining phenotypic variation. The results obtained using XGB and LGB algorithms agreed with previous results. Therefore, the method proposed could be applied for future association studies on milk traits.Instituto de GenéticaFil: Raschia, Maria Agustina. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Genética; ArgentinaFil: Ríos, Pablo J. Universidad de Buenos Aires; ArgentinaFil: Ríos, Pablo J. Universidad Nacional de La Plata. Facultad de Ciencias Exactas; ArgentinaFil: Maizon, Daniel Omar. Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria Anguil; ArgentinaFil: Maizon, Daniel Omar. Universidad Nacional de La Pampa. Facultad de Agronomía; ArgentinaFil: Demitrio, Daniel Arturo. Instituto Nacional de Tecnología Agropecuaria (INTA). Dirección General de Sistemas de Información, Comunicación y Procesos. Gerencia de Informática y Gestión de la Información; ArgentinaFil: Demitrio, Daniel Arturo. Universidad Nacional de La Plata. Facultad de Ciencias Exactas; ArgentinaFil: Poli, Mario Andres. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Genética; ArgentinaFil: Poli, Mario Andres. Universidad del Salvador. Facultad de Ciencias Agrarias y Veterinaria; ArgentinaElsevier2022-05-26T17:34:45Z2022-05-26T17:34:45Z2022info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfhttp://hdl.handle.net/20.500.12123/11954https://www.sciencedirect.com/science/article/pii/S22150161220011452215-0161https://doi.org/10.1016/j.mex.2022.101733MethodsX 9 : 101733 (2022)reponame:INTA Digital (INTA)instname:Instituto Nacional de Tecnología Agropecuariaenginfo:eu-repograntAgreement/INTA/2019-PE-E6-I145-001/2019-PE-E6-I145-001/AR./Mejora genética objetiva para aumentar la eficiencia de los sistemas de producción animal.info:eu-repograntAgreement/INTA/2019-PT-E9-I180-001/2019-PT-E9-I180-001/AR./TICs y gestión de Big Datainfo:eu-repograntAgreement/INTA/2019-PT-E6-I513-001/2019-PT-E6-I513-001/AR./Plataforma de mejoramiento animalinfo:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)2025-09-29T13:45:34Zoai:localhost:20.500.12123/11954instacron:INTAInstitucionalhttp://repositorio.inta.gob.ar/Organismo científico-tecnológicoNo correspondehttp://repositorio.inta.gob.ar/oai/requesttripaldi.nicolas@inta.gob.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:l2025-09-29 13:45:34.475INTA Digital (INTA) - Instituto Nacional de Tecnología Agropecuariafalse
dc.title.none.fl_str_mv Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms
title Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms
spellingShingle Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms
Raschia, Maria Agustina
Single Nucleotide Polymorphism
Dairy Cattle
Milk Production
Milk Protein
Bioinformatics
Loci
Polimorfismo de un Solo Nucleótidos
Ganado de Leche
Producción Lechera
Proteínas de la Leche
Bioinformática
Milk Fat Content
Machine Learning Algorithms
Contenido de Grasa Láctea
Algoritmos de Aprendizaje Automático
title_short Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms
title_full Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms
title_fullStr Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms
title_full_unstemmed Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms
title_sort Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms
dc.creator.none.fl_str_mv Raschia, Maria Agustina
Ríos, Pablo Javier
Maizon, Daniel Omar
Demitrio, Daniel Arturo
Poli, Mario Andres
author Raschia, Maria Agustina
author_facet Raschia, Maria Agustina
Ríos, Pablo Javier
Maizon, Daniel Omar
Demitrio, Daniel Arturo
Poli, Mario Andres
author_role author
author2 Ríos, Pablo Javier
Maizon, Daniel Omar
Demitrio, Daniel Arturo
Poli, Mario Andres
author2_role author
author
author
author
dc.subject.none.fl_str_mv Single Nucleotide Polymorphism
Dairy Cattle
Milk Production
Milk Protein
Bioinformatics
Loci
Polimorfismo de un Solo Nucleótidos
Ganado de Leche
Producción Lechera
Proteínas de la Leche
Bioinformática
Milk Fat Content
Machine Learning Algorithms
Contenido de Grasa Láctea
Algoritmos de Aprendizaje Automático
topic Single Nucleotide Polymorphism
Dairy Cattle
Milk Production
Milk Protein
Bioinformatics
Loci
Polimorfismo de un Solo Nucleótidos
Ganado de Leche
Producción Lechera
Proteínas de la Leche
Bioinformática
Milk Fat Content
Machine Learning Algorithms
Contenido de Grasa Láctea
Algoritmos de Aprendizaje Automático
dc.description.none.fl_txt_mv Machine learning methods were considered efficient in identifying single nucleotide polymorphisms (SNP) underlying a trait of interest. This study aimed to construct predictive models using machine learning algorithms, to identify loci that best explain the variance in milk traits of dairy cattle. Further objectives involved validating the results by comparison with reported relevant regions and retrieving the pathways overrepresented by the genes flanking relevant SNPs. Regression models using XGBoost (XGB), LightGBM (LGB), and Random Forest (RF) algorithms were trained using estimated breeding values for milk production (EBVM), milk fat content (EBVF) and milk protein content (EBVP) as phenotypes and genotypes on 40417 SNPs as predictor variables. To evaluate their efficiency, metrics for actual vs. predicted values were determined in validation folds (XGB and LGB) and out-of-bag data (RF). Less than 4500 relevant SNPs were retrieved for each trait. Among the genes flanking them, signaling and transmembrane transporter activities were overrepresented. The models trained: •Predicted breeding values for animals not included in the dataset. •Were efficient in identifying a subset of SNPs explaining phenotypic variation. The results obtained using XGB and LGB algorithms agreed with previous results. Therefore, the method proposed could be applied for future association studies on milk traits.
Instituto de Genética
Fil: Raschia, Maria Agustina. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Genética; Argentina
Fil: Ríos, Pablo J. Universidad de Buenos Aires; Argentina
Fil: Ríos, Pablo J. Universidad Nacional de La Plata. Facultad de Ciencias Exactas; Argentina
Fil: Maizon, Daniel Omar. Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria Anguil; Argentina
Fil: Maizon, Daniel Omar. Universidad Nacional de La Pampa. Facultad de Agronomía; Argentina
Fil: Demitrio, Daniel Arturo. Instituto Nacional de Tecnología Agropecuaria (INTA). Dirección General de Sistemas de Información, Comunicación y Procesos. Gerencia de Informática y Gestión de la Información; Argentina
Fil: Demitrio, Daniel Arturo. Universidad Nacional de La Plata. Facultad de Ciencias Exactas; Argentina
Fil: Poli, Mario Andres. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Genética; Argentina
Fil: Poli, Mario Andres. Universidad del Salvador. Facultad de Ciencias Agrarias y Veterinaria; Argentina
description Machine learning methods were considered efficient in identifying single nucleotide polymorphisms (SNP) underlying a trait of interest. This study aimed to construct predictive models using machine learning algorithms, to identify loci that best explain the variance in milk traits of dairy cattle. Further objectives involved validating the results by comparison with reported relevant regions and retrieving the pathways overrepresented by the genes flanking relevant SNPs. Regression models using XGBoost (XGB), LightGBM (LGB), and Random Forest (RF) algorithms were trained using estimated breeding values for milk production (EBVM), milk fat content (EBVF) and milk protein content (EBVP) as phenotypes and genotypes on 40417 SNPs as predictor variables. To evaluate their efficiency, metrics for actual vs. predicted values were determined in validation folds (XGB and LGB) and out-of-bag data (RF). Less than 4500 relevant SNPs were retrieved for each trait. Among the genes flanking them, signaling and transmembrane transporter activities were overrepresented. The models trained: •Predicted breeding values for animals not included in the dataset. •Were efficient in identifying a subset of SNPs explaining phenotypic variation. The results obtained using XGB and LGB algorithms agreed with previous results. Therefore, the method proposed could be applied for future association studies on milk traits.
publishDate 2022
dc.date.none.fl_str_mv 2022-05-26T17:34:45Z
2022-05-26T17:34:45Z
2022
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/20.500.12123/11954
https://www.sciencedirect.com/science/article/pii/S2215016122001145
2215-0161
https://doi.org/10.1016/j.mex.2022.101733
url http://hdl.handle.net/20.500.12123/11954
https://www.sciencedirect.com/science/article/pii/S2215016122001145
https://doi.org/10.1016/j.mex.2022.101733
identifier_str_mv 2215-0161
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repograntAgreement/INTA/2019-PE-E6-I145-001/2019-PE-E6-I145-001/AR./Mejora genética objetiva para aumentar la eficiencia de los sistemas de producción animal.
info:eu-repograntAgreement/INTA/2019-PT-E9-I180-001/2019-PT-E9-I180-001/AR./TICs y gestión de Big Data
info:eu-repograntAgreement/INTA/2019-PT-E6-I513-001/2019-PT-E6-I513-001/AR./Plataforma de mejoramiento animal
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Elsevier
publisher.none.fl_str_mv Elsevier
dc.source.none.fl_str_mv MethodsX 9 : 101733 (2022)
reponame:INTA Digital (INTA)
instname:Instituto Nacional de Tecnología Agropecuaria
reponame_str INTA Digital (INTA)
collection INTA Digital (INTA)
instname_str Instituto Nacional de Tecnología Agropecuaria
repository.name.fl_str_mv INTA Digital (INTA) - Instituto Nacional de Tecnología Agropecuaria
repository.mail.fl_str_mv tripaldi.nicolas@inta.gob.ar
_version_ 1844619165171712000
score 12.559606