Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms
- Autores
- Raschia, Maria Agustina; Ríos, Pablo Javier; Maizon, Daniel Omar; Demitrio, Daniel Arturo; Poli, Mario Andres
- Año de publicación
- 2022
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Machine learning methods were considered efficient in identifying single nucleotide polymorphisms (SNP) underlying a trait of interest. This study aimed to construct predictive models using machine learning algorithms, to identify loci that best explain the variance in milk traits of dairy cattle. Further objectives involved validating the results by comparison with reported relevant regions and retrieving the pathways overrepresented by the genes flanking relevant SNPs. Regression models using XGBoost (XGB), LightGBM (LGB), and Random Forest (RF) algorithms were trained using estimated breeding values for milk production (EBVM), milk fat content (EBVF) and milk protein content (EBVP) as phenotypes and genotypes on 40417 SNPs as predictor variables. To evaluate their efficiency, metrics for actual vs. predicted values were determined in validation folds (XGB and LGB) and out-of-bag data (RF). Less than 4500 relevant SNPs were retrieved for each trait. Among the genes flanking them, signaling and transmembrane transporter activities were overrepresented. The models trained: •Predicted breeding values for animals not included in the dataset. •Were efficient in identifying a subset of SNPs explaining phenotypic variation. The results obtained using XGB and LGB algorithms agreed with previous results. Therefore, the method proposed could be applied for future association studies on milk traits.
Instituto de Genética
Fil: Raschia, Maria Agustina. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Genética; Argentina
Fil: Ríos, Pablo J. Universidad de Buenos Aires; Argentina
Fil: Ríos, Pablo J. Universidad Nacional de La Plata. Facultad de Ciencias Exactas; Argentina
Fil: Maizon, Daniel Omar. Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria Anguil; Argentina
Fil: Maizon, Daniel Omar. Universidad Nacional de La Pampa. Facultad de Agronomía; Argentina
Fil: Demitrio, Daniel Arturo. Instituto Nacional de Tecnología Agropecuaria (INTA). Dirección General de Sistemas de Información, Comunicación y Procesos. Gerencia de Informática y Gestión de la Información; Argentina
Fil: Demitrio, Daniel Arturo. Universidad Nacional de La Plata. Facultad de Ciencias Exactas; Argentina
Fil: Poli, Mario Andres. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Genética; Argentina
Fil: Poli, Mario Andres. Universidad del Salvador. Facultad de Ciencias Agrarias y Veterinaria; Argentina - Fuente
- MethodsX 9 : 101733 (2022)
- Materia
-
Single Nucleotide Polymorphism
Dairy Cattle
Milk Production
Milk Protein
Bioinformatics
Loci
Polimorfismo de un Solo Nucleótidos
Ganado de Leche
Producción Lechera
Proteínas de la Leche
Bioinformática
Milk Fat Content
Machine Learning Algorithms
Contenido de Grasa Láctea
Algoritmos de Aprendizaje Automático - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-nc-sa/4.0/
- Repositorio
- Institución
- Instituto Nacional de Tecnología Agropecuaria
- OAI Identificador
- oai:localhost:20.500.12123/11954
Ver los metadatos del registro completo
id |
INTADig_27c0c6a34410af9ff59e5eb617add6ab |
---|---|
oai_identifier_str |
oai:localhost:20.500.12123/11954 |
network_acronym_str |
INTADig |
repository_id_str |
l |
network_name_str |
INTA Digital (INTA) |
spelling |
Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithmsRaschia, Maria AgustinaRíos, Pablo JavierMaizon, Daniel OmarDemitrio, Daniel ArturoPoli, Mario AndresSingle Nucleotide PolymorphismDairy CattleMilk ProductionMilk ProteinBioinformaticsLociPolimorfismo de un Solo NucleótidosGanado de LecheProducción LecheraProteínas de la LecheBioinformáticaMilk Fat ContentMachine Learning AlgorithmsContenido de Grasa LácteaAlgoritmos de Aprendizaje AutomáticoMachine learning methods were considered efficient in identifying single nucleotide polymorphisms (SNP) underlying a trait of interest. This study aimed to construct predictive models using machine learning algorithms, to identify loci that best explain the variance in milk traits of dairy cattle. Further objectives involved validating the results by comparison with reported relevant regions and retrieving the pathways overrepresented by the genes flanking relevant SNPs. Regression models using XGBoost (XGB), LightGBM (LGB), and Random Forest (RF) algorithms were trained using estimated breeding values for milk production (EBVM), milk fat content (EBVF) and milk protein content (EBVP) as phenotypes and genotypes on 40417 SNPs as predictor variables. To evaluate their efficiency, metrics for actual vs. predicted values were determined in validation folds (XGB and LGB) and out-of-bag data (RF). Less than 4500 relevant SNPs were retrieved for each trait. Among the genes flanking them, signaling and transmembrane transporter activities were overrepresented. The models trained: •Predicted breeding values for animals not included in the dataset. •Were efficient in identifying a subset of SNPs explaining phenotypic variation. The results obtained using XGB and LGB algorithms agreed with previous results. Therefore, the method proposed could be applied for future association studies on milk traits.Instituto de GenéticaFil: Raschia, Maria Agustina. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Genética; ArgentinaFil: Ríos, Pablo J. Universidad de Buenos Aires; ArgentinaFil: Ríos, Pablo J. Universidad Nacional de La Plata. Facultad de Ciencias Exactas; ArgentinaFil: Maizon, Daniel Omar. Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria Anguil; ArgentinaFil: Maizon, Daniel Omar. Universidad Nacional de La Pampa. Facultad de Agronomía; ArgentinaFil: Demitrio, Daniel Arturo. Instituto Nacional de Tecnología Agropecuaria (INTA). Dirección General de Sistemas de Información, Comunicación y Procesos. Gerencia de Informática y Gestión de la Información; ArgentinaFil: Demitrio, Daniel Arturo. Universidad Nacional de La Plata. Facultad de Ciencias Exactas; ArgentinaFil: Poli, Mario Andres. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Genética; ArgentinaFil: Poli, Mario Andres. Universidad del Salvador. Facultad de Ciencias Agrarias y Veterinaria; ArgentinaElsevier2022-05-26T17:34:45Z2022-05-26T17:34:45Z2022info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfhttp://hdl.handle.net/20.500.12123/11954https://www.sciencedirect.com/science/article/pii/S22150161220011452215-0161https://doi.org/10.1016/j.mex.2022.101733MethodsX 9 : 101733 (2022)reponame:INTA Digital (INTA)instname:Instituto Nacional de Tecnología Agropecuariaenginfo:eu-repograntAgreement/INTA/2019-PE-E6-I145-001/2019-PE-E6-I145-001/AR./Mejora genética objetiva para aumentar la eficiencia de los sistemas de producción animal.info:eu-repograntAgreement/INTA/2019-PT-E9-I180-001/2019-PT-E9-I180-001/AR./TICs y gestión de Big Datainfo:eu-repograntAgreement/INTA/2019-PT-E6-I513-001/2019-PT-E6-I513-001/AR./Plataforma de mejoramiento animalinfo:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)2025-09-29T13:45:34Zoai:localhost:20.500.12123/11954instacron:INTAInstitucionalhttp://repositorio.inta.gob.ar/Organismo científico-tecnológicoNo correspondehttp://repositorio.inta.gob.ar/oai/requesttripaldi.nicolas@inta.gob.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:l2025-09-29 13:45:34.475INTA Digital (INTA) - Instituto Nacional de Tecnología Agropecuariafalse |
dc.title.none.fl_str_mv |
Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms |
title |
Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms |
spellingShingle |
Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms Raschia, Maria Agustina Single Nucleotide Polymorphism Dairy Cattle Milk Production Milk Protein Bioinformatics Loci Polimorfismo de un Solo Nucleótidos Ganado de Leche Producción Lechera Proteínas de la Leche Bioinformática Milk Fat Content Machine Learning Algorithms Contenido de Grasa Láctea Algoritmos de Aprendizaje Automático |
title_short |
Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms |
title_full |
Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms |
title_fullStr |
Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms |
title_full_unstemmed |
Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms |
title_sort |
Methodology for the identification of relevant loci for milk traits in dairy cattle, using machine learning algorithms |
dc.creator.none.fl_str_mv |
Raschia, Maria Agustina Ríos, Pablo Javier Maizon, Daniel Omar Demitrio, Daniel Arturo Poli, Mario Andres |
author |
Raschia, Maria Agustina |
author_facet |
Raschia, Maria Agustina Ríos, Pablo Javier Maizon, Daniel Omar Demitrio, Daniel Arturo Poli, Mario Andres |
author_role |
author |
author2 |
Ríos, Pablo Javier Maizon, Daniel Omar Demitrio, Daniel Arturo Poli, Mario Andres |
author2_role |
author author author author |
dc.subject.none.fl_str_mv |
Single Nucleotide Polymorphism Dairy Cattle Milk Production Milk Protein Bioinformatics Loci Polimorfismo de un Solo Nucleótidos Ganado de Leche Producción Lechera Proteínas de la Leche Bioinformática Milk Fat Content Machine Learning Algorithms Contenido de Grasa Láctea Algoritmos de Aprendizaje Automático |
topic |
Single Nucleotide Polymorphism Dairy Cattle Milk Production Milk Protein Bioinformatics Loci Polimorfismo de un Solo Nucleótidos Ganado de Leche Producción Lechera Proteínas de la Leche Bioinformática Milk Fat Content Machine Learning Algorithms Contenido de Grasa Láctea Algoritmos de Aprendizaje Automático |
dc.description.none.fl_txt_mv |
Machine learning methods were considered efficient in identifying single nucleotide polymorphisms (SNP) underlying a trait of interest. This study aimed to construct predictive models using machine learning algorithms, to identify loci that best explain the variance in milk traits of dairy cattle. Further objectives involved validating the results by comparison with reported relevant regions and retrieving the pathways overrepresented by the genes flanking relevant SNPs. Regression models using XGBoost (XGB), LightGBM (LGB), and Random Forest (RF) algorithms were trained using estimated breeding values for milk production (EBVM), milk fat content (EBVF) and milk protein content (EBVP) as phenotypes and genotypes on 40417 SNPs as predictor variables. To evaluate their efficiency, metrics for actual vs. predicted values were determined in validation folds (XGB and LGB) and out-of-bag data (RF). Less than 4500 relevant SNPs were retrieved for each trait. Among the genes flanking them, signaling and transmembrane transporter activities were overrepresented. The models trained: •Predicted breeding values for animals not included in the dataset. •Were efficient in identifying a subset of SNPs explaining phenotypic variation. The results obtained using XGB and LGB algorithms agreed with previous results. Therefore, the method proposed could be applied for future association studies on milk traits. Instituto de Genética Fil: Raschia, Maria Agustina. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Genética; Argentina Fil: Ríos, Pablo J. Universidad de Buenos Aires; Argentina Fil: Ríos, Pablo J. Universidad Nacional de La Plata. Facultad de Ciencias Exactas; Argentina Fil: Maizon, Daniel Omar. Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria Anguil; Argentina Fil: Maizon, Daniel Omar. Universidad Nacional de La Pampa. Facultad de Agronomía; Argentina Fil: Demitrio, Daniel Arturo. Instituto Nacional de Tecnología Agropecuaria (INTA). Dirección General de Sistemas de Información, Comunicación y Procesos. Gerencia de Informática y Gestión de la Información; Argentina Fil: Demitrio, Daniel Arturo. Universidad Nacional de La Plata. Facultad de Ciencias Exactas; Argentina Fil: Poli, Mario Andres. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Genética; Argentina Fil: Poli, Mario Andres. Universidad del Salvador. Facultad de Ciencias Agrarias y Veterinaria; Argentina |
description |
Machine learning methods were considered efficient in identifying single nucleotide polymorphisms (SNP) underlying a trait of interest. This study aimed to construct predictive models using machine learning algorithms, to identify loci that best explain the variance in milk traits of dairy cattle. Further objectives involved validating the results by comparison with reported relevant regions and retrieving the pathways overrepresented by the genes flanking relevant SNPs. Regression models using XGBoost (XGB), LightGBM (LGB), and Random Forest (RF) algorithms were trained using estimated breeding values for milk production (EBVM), milk fat content (EBVF) and milk protein content (EBVP) as phenotypes and genotypes on 40417 SNPs as predictor variables. To evaluate their efficiency, metrics for actual vs. predicted values were determined in validation folds (XGB and LGB) and out-of-bag data (RF). Less than 4500 relevant SNPs were retrieved for each trait. Among the genes flanking them, signaling and transmembrane transporter activities were overrepresented. The models trained: •Predicted breeding values for animals not included in the dataset. •Were efficient in identifying a subset of SNPs explaining phenotypic variation. The results obtained using XGB and LGB algorithms agreed with previous results. Therefore, the method proposed could be applied for future association studies on milk traits. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-05-26T17:34:45Z 2022-05-26T17:34:45Z 2022 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/20.500.12123/11954 https://www.sciencedirect.com/science/article/pii/S2215016122001145 2215-0161 https://doi.org/10.1016/j.mex.2022.101733 |
url |
http://hdl.handle.net/20.500.12123/11954 https://www.sciencedirect.com/science/article/pii/S2215016122001145 https://doi.org/10.1016/j.mex.2022.101733 |
identifier_str_mv |
2215-0161 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repograntAgreement/INTA/2019-PE-E6-I145-001/2019-PE-E6-I145-001/AR./Mejora genética objetiva para aumentar la eficiencia de los sistemas de producción animal. info:eu-repograntAgreement/INTA/2019-PT-E9-I180-001/2019-PT-E9-I180-001/AR./TICs y gestión de Big Data info:eu-repograntAgreement/INTA/2019-PT-E6-I513-001/2019-PT-E6-I513-001/AR./Plataforma de mejoramiento animal |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.format.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Elsevier |
publisher.none.fl_str_mv |
Elsevier |
dc.source.none.fl_str_mv |
MethodsX 9 : 101733 (2022) reponame:INTA Digital (INTA) instname:Instituto Nacional de Tecnología Agropecuaria |
reponame_str |
INTA Digital (INTA) |
collection |
INTA Digital (INTA) |
instname_str |
Instituto Nacional de Tecnología Agropecuaria |
repository.name.fl_str_mv |
INTA Digital (INTA) - Instituto Nacional de Tecnología Agropecuaria |
repository.mail.fl_str_mv |
tripaldi.nicolas@inta.gob.ar |
_version_ |
1844619165171712000 |
score |
12.559606 |