An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach

Autores
Pazos Obregón, Flavio; Palazzo, Martin; Soto, Pablo; Guerberoff, Gustavo; Yankilevich, Patricio; Cantera, Rafael
Año de publicación
2019
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Background: Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Previously, we had trained an ensemble machine learning model to assign a probability of having synaptic function to every protein-coding gene in Drosophila melanogaster. This approach resulted in the publication of a catalogue of 893 genes which we postulated to be very enriched in genes with a still undocumented synaptic function. Since then, the scientific community has experimentally identified 79 new synaptic genes. Here we use these new empirical data to evaluate our original prediction. We also implement a series of changes to the training scheme of our model and using the new data we demonstrate that this improves its predictive power. Finally, we added the new synaptic genes to the training set and trained a new model, obtaining a new, enhanced catalogue of putative synaptic genes. Results: The retrospective analysis demonstrate that our original catalogue was significantly enriched in new synaptic genes. When the changes to the training scheme were implemented using the original training set we obtained even higher enrichment. Finally, applying the new training scheme with a training set including the 79 new synaptic genes, resulted in an enhanced catalogue of putative synaptic genes. Here we present this new catalogue and announce that a regularly updated version will be available online at: Http://synapticgenes.bnd.edu.uy Conclusions: We show that training an ensemble of machine learning classifiers solely with the whole-body temporal transcription profiles of known synaptic genes resulted in a catalogue with a significant enrichment in undiscovered synaptic genes. Using new empirical data provided by the scientific community, we validated our original approach, improved our model an obtained an arguably more precise prediction. This approach reduces the number of genes to be tested through hypothesis-driven experimentation and will facilitate our understanding of neuronal function. Availability: Http://synapticgenes.bnd.edu.uy
Fil: Pazos Obregón, Flavio. Instituto de Investigaciones Biológicas "Clemente Estable"; Uruguay
Fil: Palazzo, Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentina
Fil: Soto, Pablo. Instituto de Investigaciones Biológicas "Clemente Estable"; Uruguay
Fil: Guerberoff, Gustavo. Universidad de la República; Uruguay
Fil: Yankilevich, Patricio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentina
Fil: Cantera, Rafael. Instituto de Investigaciones Biológicas "Clemente Estable"; Uruguay
Materia
DROSOPHILA MELANOGASTER
GENE FUNCTION PREDICTION
MACHINE LEARNING
SYNAPTIC GENES
TEMPORAL TRANSCRIPTION PROFILES
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/124570

id CONICETDig_8eea58ca410f57bbd110fe074956ac65
oai_identifier_str oai:ri.conicet.gov.ar:11336/124570
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approachPazos Obregón, FlavioPalazzo, MartinSoto, PabloGuerberoff, GustavoYankilevich, PatricioCantera, RafaelDROSOPHILA MELANOGASTERGENE FUNCTION PREDICTIONMACHINE LEARNINGSYNAPTIC GENESTEMPORAL TRANSCRIPTION PROFILEShttps://purl.org/becyt/ford/1.6https://purl.org/becyt/ford/1Background: Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Previously, we had trained an ensemble machine learning model to assign a probability of having synaptic function to every protein-coding gene in Drosophila melanogaster. This approach resulted in the publication of a catalogue of 893 genes which we postulated to be very enriched in genes with a still undocumented synaptic function. Since then, the scientific community has experimentally identified 79 new synaptic genes. Here we use these new empirical data to evaluate our original prediction. We also implement a series of changes to the training scheme of our model and using the new data we demonstrate that this improves its predictive power. Finally, we added the new synaptic genes to the training set and trained a new model, obtaining a new, enhanced catalogue of putative synaptic genes. Results: The retrospective analysis demonstrate that our original catalogue was significantly enriched in new synaptic genes. When the changes to the training scheme were implemented using the original training set we obtained even higher enrichment. Finally, applying the new training scheme with a training set including the 79 new synaptic genes, resulted in an enhanced catalogue of putative synaptic genes. Here we present this new catalogue and announce that a regularly updated version will be available online at: Http://synapticgenes.bnd.edu.uy Conclusions: We show that training an ensemble of machine learning classifiers solely with the whole-body temporal transcription profiles of known synaptic genes resulted in a catalogue with a significant enrichment in undiscovered synaptic genes. Using new empirical data provided by the scientific community, we validated our original approach, improved our model an obtained an arguably more precise prediction. This approach reduces the number of genes to be tested through hypothesis-driven experimentation and will facilitate our understanding of neuronal function. Availability: Http://synapticgenes.bnd.edu.uyFil: Pazos Obregón, Flavio. Instituto de Investigaciones Biológicas "Clemente Estable"; UruguayFil: Palazzo, Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; ArgentinaFil: Soto, Pablo. Instituto de Investigaciones Biológicas "Clemente Estable"; UruguayFil: Guerberoff, Gustavo. Universidad de la República; UruguayFil: Yankilevich, Patricio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; ArgentinaFil: Cantera, Rafael. Instituto de Investigaciones Biológicas "Clemente Estable"; UruguayBioMed Central2019-12info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/124570Pazos Obregón, Flavio; Palazzo, Martin; Soto, Pablo; Guerberoff, Gustavo; Yankilevich, Patricio; et al.; An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach; BioMed Central; BMC Genomics; 20; 1; 12-2019; 1-81471-2164CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1186/s12864-019-6380-zinfo:eu-repo/semantics/altIdentifier/url/https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-019-6380-zinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-10-22T11:14:25Zoai:ri.conicet.gov.ar:11336/124570instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-10-22 11:14:25.461CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach
title An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach
spellingShingle An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach
Pazos Obregón, Flavio
DROSOPHILA MELANOGASTER
GENE FUNCTION PREDICTION
MACHINE LEARNING
SYNAPTIC GENES
TEMPORAL TRANSCRIPTION PROFILES
title_short An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach
title_full An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach
title_fullStr An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach
title_full_unstemmed An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach
title_sort An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach
dc.creator.none.fl_str_mv Pazos Obregón, Flavio
Palazzo, Martin
Soto, Pablo
Guerberoff, Gustavo
Yankilevich, Patricio
Cantera, Rafael
author Pazos Obregón, Flavio
author_facet Pazos Obregón, Flavio
Palazzo, Martin
Soto, Pablo
Guerberoff, Gustavo
Yankilevich, Patricio
Cantera, Rafael
author_role author
author2 Palazzo, Martin
Soto, Pablo
Guerberoff, Gustavo
Yankilevich, Patricio
Cantera, Rafael
author2_role author
author
author
author
author
dc.subject.none.fl_str_mv DROSOPHILA MELANOGASTER
GENE FUNCTION PREDICTION
MACHINE LEARNING
SYNAPTIC GENES
TEMPORAL TRANSCRIPTION PROFILES
topic DROSOPHILA MELANOGASTER
GENE FUNCTION PREDICTION
MACHINE LEARNING
SYNAPTIC GENES
TEMPORAL TRANSCRIPTION PROFILES
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.6
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Background: Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Previously, we had trained an ensemble machine learning model to assign a probability of having synaptic function to every protein-coding gene in Drosophila melanogaster. This approach resulted in the publication of a catalogue of 893 genes which we postulated to be very enriched in genes with a still undocumented synaptic function. Since then, the scientific community has experimentally identified 79 new synaptic genes. Here we use these new empirical data to evaluate our original prediction. We also implement a series of changes to the training scheme of our model and using the new data we demonstrate that this improves its predictive power. Finally, we added the new synaptic genes to the training set and trained a new model, obtaining a new, enhanced catalogue of putative synaptic genes. Results: The retrospective analysis demonstrate that our original catalogue was significantly enriched in new synaptic genes. When the changes to the training scheme were implemented using the original training set we obtained even higher enrichment. Finally, applying the new training scheme with a training set including the 79 new synaptic genes, resulted in an enhanced catalogue of putative synaptic genes. Here we present this new catalogue and announce that a regularly updated version will be available online at: Http://synapticgenes.bnd.edu.uy Conclusions: We show that training an ensemble of machine learning classifiers solely with the whole-body temporal transcription profiles of known synaptic genes resulted in a catalogue with a significant enrichment in undiscovered synaptic genes. Using new empirical data provided by the scientific community, we validated our original approach, improved our model an obtained an arguably more precise prediction. This approach reduces the number of genes to be tested through hypothesis-driven experimentation and will facilitate our understanding of neuronal function. Availability: Http://synapticgenes.bnd.edu.uy
Fil: Pazos Obregón, Flavio. Instituto de Investigaciones Biológicas "Clemente Estable"; Uruguay
Fil: Palazzo, Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentina
Fil: Soto, Pablo. Instituto de Investigaciones Biológicas "Clemente Estable"; Uruguay
Fil: Guerberoff, Gustavo. Universidad de la República; Uruguay
Fil: Yankilevich, Patricio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentina
Fil: Cantera, Rafael. Instituto de Investigaciones Biológicas "Clemente Estable"; Uruguay
description Background: Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Previously, we had trained an ensemble machine learning model to assign a probability of having synaptic function to every protein-coding gene in Drosophila melanogaster. This approach resulted in the publication of a catalogue of 893 genes which we postulated to be very enriched in genes with a still undocumented synaptic function. Since then, the scientific community has experimentally identified 79 new synaptic genes. Here we use these new empirical data to evaluate our original prediction. We also implement a series of changes to the training scheme of our model and using the new data we demonstrate that this improves its predictive power. Finally, we added the new synaptic genes to the training set and trained a new model, obtaining a new, enhanced catalogue of putative synaptic genes. Results: The retrospective analysis demonstrate that our original catalogue was significantly enriched in new synaptic genes. When the changes to the training scheme were implemented using the original training set we obtained even higher enrichment. Finally, applying the new training scheme with a training set including the 79 new synaptic genes, resulted in an enhanced catalogue of putative synaptic genes. Here we present this new catalogue and announce that a regularly updated version will be available online at: Http://synapticgenes.bnd.edu.uy Conclusions: We show that training an ensemble of machine learning classifiers solely with the whole-body temporal transcription profiles of known synaptic genes resulted in a catalogue with a significant enrichment in undiscovered synaptic genes. Using new empirical data provided by the scientific community, we validated our original approach, improved our model an obtained an arguably more precise prediction. This approach reduces the number of genes to be tested through hypothesis-driven experimentation and will facilitate our understanding of neuronal function. Availability: Http://synapticgenes.bnd.edu.uy
publishDate 2019
dc.date.none.fl_str_mv 2019-12
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/124570
Pazos Obregón, Flavio; Palazzo, Martin; Soto, Pablo; Guerberoff, Gustavo; Yankilevich, Patricio; et al.; An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach; BioMed Central; BMC Genomics; 20; 1; 12-2019; 1-8
1471-2164
CONICET Digital
CONICET
url http://hdl.handle.net/11336/124570
identifier_str_mv Pazos Obregón, Flavio; Palazzo, Martin; Soto, Pablo; Guerberoff, Gustavo; Yankilevich, Patricio; et al.; An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach; BioMed Central; BMC Genomics; 20; 1; 12-2019; 1-8
1471-2164
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/10.1186/s12864-019-6380-z
info:eu-repo/semantics/altIdentifier/url/https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-019-6380-z
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv BioMed Central
publisher.none.fl_str_mv BioMed Central
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1846781564688531456
score 13.234792