Hierarchical deep learning for predicting GO annotations by integrating protein knowledge

Autores
Merino, Gabriela Alejandra; Saidi, Rabie; Milone, Diego Humberto; Stegmayer, Georgina; Martin, Maria J.
Año de publicación
2022
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Motivation: Experimental testing and manual curation are the most precise ways for assigning Gene Ontology (GO) terms describing protein functions. However, they are expensive, time-consuming and cannot cope with the exponential growth of data generated by high-throughput sequencing methods. Hence, researchers need reliable computational systems to help fill the gap with automatic function prediction. The results of the last Critical Assessment of Function Annotation challenge revealed that GO-terms prediction remains a very challenging task. Recent developments on deep learning are significantly breaking out the frontiers leading to new knowledge in protein research thanks to the integration of data from multiple sources. However, deep models hitherto developed for functional prediction are mainly focused on sequence data and have not achieved breakthrough performances yet. Results: We propose DeeProtGO, a novel deep-learning model for predicting GO annotations by integrating protein knowledge. DeeProtGO was trained for solving 18 different prediction problems, defined by the three GO sub-ontologies, the type of proteins, and the taxonomic kingdom. Our experiments reported higher prediction quality when more protein knowledge is integrated. We also benchmarked DeeProtGO against state-of-the-art methods on public datasets, and showed it can effectively improve the prediction of GO annotations.
Fil: Merino, Gabriela Alejandra. European Molecular Biology Laboratory. European Bioinformatics Institute.; Reino Unido. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Saidi, Rabie. European Molecular Biology Laboratory. European Bioinformatics Institute.; Reino Unido
Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Martin, Maria J.. European Molecular Biology Laboratory. European Bioinformatics Institute.; Reino Unido
Materia
AUTOMATIC FUNCTION PREDICTION
PROTEIN ANNOTATION
DEEP LEARNING
KNOWLEDGE INTEGRATION
GO TERMS PREDICTION
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/210988

id CONICETDig_d8217e66ef8bd74ac5b6c232cb774621
oai_identifier_str oai:ri.conicet.gov.ar:11336/210988
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Hierarchical deep learning for predicting GO annotations by integrating protein knowledgeMerino, Gabriela AlejandraSaidi, RabieMilone, Diego HumbertoStegmayer, GeorginaMartin, Maria J.AUTOMATIC FUNCTION PREDICTIONPROTEIN ANNOTATIONDEEP LEARNINGKNOWLEDGE INTEGRATIONGO TERMS PREDICTIONhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Motivation: Experimental testing and manual curation are the most precise ways for assigning Gene Ontology (GO) terms describing protein functions. However, they are expensive, time-consuming and cannot cope with the exponential growth of data generated by high-throughput sequencing methods. Hence, researchers need reliable computational systems to help fill the gap with automatic function prediction. The results of the last Critical Assessment of Function Annotation challenge revealed that GO-terms prediction remains a very challenging task. Recent developments on deep learning are significantly breaking out the frontiers leading to new knowledge in protein research thanks to the integration of data from multiple sources. However, deep models hitherto developed for functional prediction are mainly focused on sequence data and have not achieved breakthrough performances yet. Results: We propose DeeProtGO, a novel deep-learning model for predicting GO annotations by integrating protein knowledge. DeeProtGO was trained for solving 18 different prediction problems, defined by the three GO sub-ontologies, the type of proteins, and the taxonomic kingdom. Our experiments reported higher prediction quality when more protein knowledge is integrated. We also benchmarked DeeProtGO against state-of-the-art methods on public datasets, and showed it can effectively improve the prediction of GO annotations.Fil: Merino, Gabriela Alejandra. European Molecular Biology Laboratory. European Bioinformatics Institute.; Reino Unido. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Saidi, Rabie. European Molecular Biology Laboratory. European Bioinformatics Institute.; Reino UnidoFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Martin, Maria J.. European Molecular Biology Laboratory. European Bioinformatics Institute.; Reino UnidoOxford University Press2022-08info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/210988Merino, Gabriela Alejandra; Saidi, Rabie; Milone, Diego Humberto; Stegmayer, Georgina; Martin, Maria J.; Hierarchical deep learning for predicting GO annotations by integrating protein knowledge; Oxford University Press; Bioinformatics (Oxford, England); 38; 19; 8-2022; 4488-44961367-4803CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac536/6656346info:eu-repo/semantics/altIdentifier/doi/10.1093/bioinformatics/btac536info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:18:09Zoai:ri.conicet.gov.ar:11336/210988instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:18:09.372CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Hierarchical deep learning for predicting GO annotations by integrating protein knowledge
title Hierarchical deep learning for predicting GO annotations by integrating protein knowledge
spellingShingle Hierarchical deep learning for predicting GO annotations by integrating protein knowledge
Merino, Gabriela Alejandra
AUTOMATIC FUNCTION PREDICTION
PROTEIN ANNOTATION
DEEP LEARNING
KNOWLEDGE INTEGRATION
GO TERMS PREDICTION
title_short Hierarchical deep learning for predicting GO annotations by integrating protein knowledge
title_full Hierarchical deep learning for predicting GO annotations by integrating protein knowledge
title_fullStr Hierarchical deep learning for predicting GO annotations by integrating protein knowledge
title_full_unstemmed Hierarchical deep learning for predicting GO annotations by integrating protein knowledge
title_sort Hierarchical deep learning for predicting GO annotations by integrating protein knowledge
dc.creator.none.fl_str_mv Merino, Gabriela Alejandra
Saidi, Rabie
Milone, Diego Humberto
Stegmayer, Georgina
Martin, Maria J.
author Merino, Gabriela Alejandra
author_facet Merino, Gabriela Alejandra
Saidi, Rabie
Milone, Diego Humberto
Stegmayer, Georgina
Martin, Maria J.
author_role author
author2 Saidi, Rabie
Milone, Diego Humberto
Stegmayer, Georgina
Martin, Maria J.
author2_role author
author
author
author
dc.subject.none.fl_str_mv AUTOMATIC FUNCTION PREDICTION
PROTEIN ANNOTATION
DEEP LEARNING
KNOWLEDGE INTEGRATION
GO TERMS PREDICTION
topic AUTOMATIC FUNCTION PREDICTION
PROTEIN ANNOTATION
DEEP LEARNING
KNOWLEDGE INTEGRATION
GO TERMS PREDICTION
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Motivation: Experimental testing and manual curation are the most precise ways for assigning Gene Ontology (GO) terms describing protein functions. However, they are expensive, time-consuming and cannot cope with the exponential growth of data generated by high-throughput sequencing methods. Hence, researchers need reliable computational systems to help fill the gap with automatic function prediction. The results of the last Critical Assessment of Function Annotation challenge revealed that GO-terms prediction remains a very challenging task. Recent developments on deep learning are significantly breaking out the frontiers leading to new knowledge in protein research thanks to the integration of data from multiple sources. However, deep models hitherto developed for functional prediction are mainly focused on sequence data and have not achieved breakthrough performances yet. Results: We propose DeeProtGO, a novel deep-learning model for predicting GO annotations by integrating protein knowledge. DeeProtGO was trained for solving 18 different prediction problems, defined by the three GO sub-ontologies, the type of proteins, and the taxonomic kingdom. Our experiments reported higher prediction quality when more protein knowledge is integrated. We also benchmarked DeeProtGO against state-of-the-art methods on public datasets, and showed it can effectively improve the prediction of GO annotations.
Fil: Merino, Gabriela Alejandra. European Molecular Biology Laboratory. European Bioinformatics Institute.; Reino Unido. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Saidi, Rabie. European Molecular Biology Laboratory. European Bioinformatics Institute.; Reino Unido
Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Martin, Maria J.. European Molecular Biology Laboratory. European Bioinformatics Institute.; Reino Unido
description Motivation: Experimental testing and manual curation are the most precise ways for assigning Gene Ontology (GO) terms describing protein functions. However, they are expensive, time-consuming and cannot cope with the exponential growth of data generated by high-throughput sequencing methods. Hence, researchers need reliable computational systems to help fill the gap with automatic function prediction. The results of the last Critical Assessment of Function Annotation challenge revealed that GO-terms prediction remains a very challenging task. Recent developments on deep learning are significantly breaking out the frontiers leading to new knowledge in protein research thanks to the integration of data from multiple sources. However, deep models hitherto developed for functional prediction are mainly focused on sequence data and have not achieved breakthrough performances yet. Results: We propose DeeProtGO, a novel deep-learning model for predicting GO annotations by integrating protein knowledge. DeeProtGO was trained for solving 18 different prediction problems, defined by the three GO sub-ontologies, the type of proteins, and the taxonomic kingdom. Our experiments reported higher prediction quality when more protein knowledge is integrated. We also benchmarked DeeProtGO against state-of-the-art methods on public datasets, and showed it can effectively improve the prediction of GO annotations.
publishDate 2022
dc.date.none.fl_str_mv 2022-08
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/210988
Merino, Gabriela Alejandra; Saidi, Rabie; Milone, Diego Humberto; Stegmayer, Georgina; Martin, Maria J.; Hierarchical deep learning for predicting GO annotations by integrating protein knowledge; Oxford University Press; Bioinformatics (Oxford, England); 38; 19; 8-2022; 4488-4496
1367-4803
CONICET Digital
CONICET
url http://hdl.handle.net/11336/210988
identifier_str_mv Merino, Gabriela Alejandra; Saidi, Rabie; Milone, Diego Humberto; Stegmayer, Georgina; Martin, Maria J.; Hierarchical deep learning for predicting GO annotations by integrating protein knowledge; Oxford University Press; Bioinformatics (Oxford, England); 38; 19; 8-2022; 4488-4496
1367-4803
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btac536/6656346
info:eu-repo/semantics/altIdentifier/doi/10.1093/bioinformatics/btac536
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv Oxford University Press
publisher.none.fl_str_mv Oxford University Press
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844614140723724288
score 13.070432