Pipeline for transferring annotations between proteins beyond globular domains

Autores
Martinez Perez, Elizabeth; Pajkos, Mátyás; Tosatto, Silvio C. E.; Gibson, Toby James; Dosztanyi, Zsuzsanna; Marino, Cristina Ester
Año de publicación
2023
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Background DisProt is the primary repository of Intrinsically Disordered Proteins (IDPs). This database is manually curated and the annotations there have strong experimental support. Currently, DisProt contains a relatively small number of proteins highlighting the importance of transferring annotations regarding verified disorder state and corresponding functions to homologous proteins in other species. In such a way, providing them with highly valuable information to better understand their biological roles. While the principles and practicalities of homology transfer are well-established for globular proteins, these are largely lacking for disordered proteins. Methods We used DisProt to evaluate the transferability of the annotation terms to orthologous proteins. For each protein, we looked for their orthologs, with the assumption that they will have a similar function. Then, for each protein and their orthologs we made multiple sequence alignments (MSAs). Disordered sequences are fast evolving and can be hard to align: Therefore we implemented alignment quality control steps ensuring robust alignments before mapping the annotations. Results We have designed a pipeline to obtain good quality MSAs and to transfer annotations from any protein to their orthologs. Applying the pipeline to DisProt proteins, from the 1,731 entries with 5,623 annotations we can reach 97,555 orthologs and transfer a total of 301,190 terms by homology. We also provide a web server for consulting the results of DisProt proteins and execute the pipeline for any other protein. The server Homology Transfer IDP (HoTIDP) is accessible at http://hotidp.leloir.org.ar.
Fil: Martinez Perez, Elizabeth. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; Argentina. Fundación Instituto Leloir; Argentina
Fil: Pajkos, Mátyás. Eötvös University; Argentina
Fil: Tosatto, Silvio C. E.. Università di Padova; Italia
Fil: Gibson, Toby James. European Molecular Biology Laboratory Heidelberg; Alemania
Fil: Dosztanyi, Zsuzsanna. Eötvös University; Argentina
Fil: Marino, Cristina Ester. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; Argentina. Fundación Instituto Leloir; Argentina
Materia
ANNOTATION
DISPROT
HOMOLOGY TRANSFER
INTRINSICALLY DISORDERED PROTEINS
MULTIPLE SEQUENCE ALIGNMENT
ONTOLOGY TERMS
ORTHOLOGOUS PROTEINS
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/228717

id CONICETDig_983df3ce7df1d2c6cd1d2934144e1b4f
oai_identifier_str oai:ri.conicet.gov.ar:11336/228717
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Pipeline for transferring annotations between proteins beyond globular domainsMartinez Perez, ElizabethPajkos, MátyásTosatto, Silvio C. E.Gibson, Toby JamesDosztanyi, ZsuzsannaMarino, Cristina EsterANNOTATIONDISPROTHOMOLOGY TRANSFERINTRINSICALLY DISORDERED PROTEINSMULTIPLE SEQUENCE ALIGNMENTONTOLOGY TERMSORTHOLOGOUS PROTEINShttps://purl.org/becyt/ford/1.7https://purl.org/becyt/ford/1Background DisProt is the primary repository of Intrinsically Disordered Proteins (IDPs). This database is manually curated and the annotations there have strong experimental support. Currently, DisProt contains a relatively small number of proteins highlighting the importance of transferring annotations regarding verified disorder state and corresponding functions to homologous proteins in other species. In such a way, providing them with highly valuable information to better understand their biological roles. While the principles and practicalities of homology transfer are well-established for globular proteins, these are largely lacking for disordered proteins. Methods We used DisProt to evaluate the transferability of the annotation terms to orthologous proteins. For each protein, we looked for their orthologs, with the assumption that they will have a similar function. Then, for each protein and their orthologs we made multiple sequence alignments (MSAs). Disordered sequences are fast evolving and can be hard to align: Therefore we implemented alignment quality control steps ensuring robust alignments before mapping the annotations. Results We have designed a pipeline to obtain good quality MSAs and to transfer annotations from any protein to their orthologs. Applying the pipeline to DisProt proteins, from the 1,731 entries with 5,623 annotations we can reach 97,555 orthologs and transfer a total of 301,190 terms by homology. We also provide a web server for consulting the results of DisProt proteins and execute the pipeline for any other protein. The server Homology Transfer IDP (HoTIDP) is accessible at http://hotidp.leloir.org.ar.Fil: Martinez Perez, Elizabeth. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; Argentina. Fundación Instituto Leloir; ArgentinaFil: Pajkos, Mátyás. Eötvös University; ArgentinaFil: Tosatto, Silvio C. E.. Università di Padova; ItaliaFil: Gibson, Toby James. European Molecular Biology Laboratory Heidelberg; AlemaniaFil: Dosztanyi, Zsuzsanna. Eötvös University; ArgentinaFil: Marino, Cristina Ester. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; Argentina. Fundación Instituto Leloir; ArgentinaJohn Wiley & Sons2023-05info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/228717Martinez Perez, Elizabeth; Pajkos, Mátyás; Tosatto, Silvio C. E.; Gibson, Toby James; Dosztanyi, Zsuzsanna; et al.; Pipeline for transferring annotations between proteins beyond globular domains; John Wiley & Sons; Protein Science; 32; 7; 5-2023; 1-210961-8368CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://onlinelibrary.wiley.com/doi/10.1002/pro.4655info:eu-repo/semantics/altIdentifier/doi/10.1002/pro.4655info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:51:19Zoai:ri.conicet.gov.ar:11336/228717instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:51:19.297CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Pipeline for transferring annotations between proteins beyond globular domains
title Pipeline for transferring annotations between proteins beyond globular domains
spellingShingle Pipeline for transferring annotations between proteins beyond globular domains
Martinez Perez, Elizabeth
ANNOTATION
DISPROT
HOMOLOGY TRANSFER
INTRINSICALLY DISORDERED PROTEINS
MULTIPLE SEQUENCE ALIGNMENT
ONTOLOGY TERMS
ORTHOLOGOUS PROTEINS
title_short Pipeline for transferring annotations between proteins beyond globular domains
title_full Pipeline for transferring annotations between proteins beyond globular domains
title_fullStr Pipeline for transferring annotations between proteins beyond globular domains
title_full_unstemmed Pipeline for transferring annotations between proteins beyond globular domains
title_sort Pipeline for transferring annotations between proteins beyond globular domains
dc.creator.none.fl_str_mv Martinez Perez, Elizabeth
Pajkos, Mátyás
Tosatto, Silvio C. E.
Gibson, Toby James
Dosztanyi, Zsuzsanna
Marino, Cristina Ester
author Martinez Perez, Elizabeth
author_facet Martinez Perez, Elizabeth
Pajkos, Mátyás
Tosatto, Silvio C. E.
Gibson, Toby James
Dosztanyi, Zsuzsanna
Marino, Cristina Ester
author_role author
author2 Pajkos, Mátyás
Tosatto, Silvio C. E.
Gibson, Toby James
Dosztanyi, Zsuzsanna
Marino, Cristina Ester
author2_role author
author
author
author
author
dc.subject.none.fl_str_mv ANNOTATION
DISPROT
HOMOLOGY TRANSFER
INTRINSICALLY DISORDERED PROTEINS
MULTIPLE SEQUENCE ALIGNMENT
ONTOLOGY TERMS
ORTHOLOGOUS PROTEINS
topic ANNOTATION
DISPROT
HOMOLOGY TRANSFER
INTRINSICALLY DISORDERED PROTEINS
MULTIPLE SEQUENCE ALIGNMENT
ONTOLOGY TERMS
ORTHOLOGOUS PROTEINS
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.7
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Background DisProt is the primary repository of Intrinsically Disordered Proteins (IDPs). This database is manually curated and the annotations there have strong experimental support. Currently, DisProt contains a relatively small number of proteins highlighting the importance of transferring annotations regarding verified disorder state and corresponding functions to homologous proteins in other species. In such a way, providing them with highly valuable information to better understand their biological roles. While the principles and practicalities of homology transfer are well-established for globular proteins, these are largely lacking for disordered proteins. Methods We used DisProt to evaluate the transferability of the annotation terms to orthologous proteins. For each protein, we looked for their orthologs, with the assumption that they will have a similar function. Then, for each protein and their orthologs we made multiple sequence alignments (MSAs). Disordered sequences are fast evolving and can be hard to align: Therefore we implemented alignment quality control steps ensuring robust alignments before mapping the annotations. Results We have designed a pipeline to obtain good quality MSAs and to transfer annotations from any protein to their orthologs. Applying the pipeline to DisProt proteins, from the 1,731 entries with 5,623 annotations we can reach 97,555 orthologs and transfer a total of 301,190 terms by homology. We also provide a web server for consulting the results of DisProt proteins and execute the pipeline for any other protein. The server Homology Transfer IDP (HoTIDP) is accessible at http://hotidp.leloir.org.ar.
Fil: Martinez Perez, Elizabeth. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; Argentina. Fundación Instituto Leloir; Argentina
Fil: Pajkos, Mátyás. Eötvös University; Argentina
Fil: Tosatto, Silvio C. E.. Università di Padova; Italia
Fil: Gibson, Toby James. European Molecular Biology Laboratory Heidelberg; Alemania
Fil: Dosztanyi, Zsuzsanna. Eötvös University; Argentina
Fil: Marino, Cristina Ester. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; Argentina. Fundación Instituto Leloir; Argentina
description Background DisProt is the primary repository of Intrinsically Disordered Proteins (IDPs). This database is manually curated and the annotations there have strong experimental support. Currently, DisProt contains a relatively small number of proteins highlighting the importance of transferring annotations regarding verified disorder state and corresponding functions to homologous proteins in other species. In such a way, providing them with highly valuable information to better understand their biological roles. While the principles and practicalities of homology transfer are well-established for globular proteins, these are largely lacking for disordered proteins. Methods We used DisProt to evaluate the transferability of the annotation terms to orthologous proteins. For each protein, we looked for their orthologs, with the assumption that they will have a similar function. Then, for each protein and their orthologs we made multiple sequence alignments (MSAs). Disordered sequences are fast evolving and can be hard to align: Therefore we implemented alignment quality control steps ensuring robust alignments before mapping the annotations. Results We have designed a pipeline to obtain good quality MSAs and to transfer annotations from any protein to their orthologs. Applying the pipeline to DisProt proteins, from the 1,731 entries with 5,623 annotations we can reach 97,555 orthologs and transfer a total of 301,190 terms by homology. We also provide a web server for consulting the results of DisProt proteins and execute the pipeline for any other protein. The server Homology Transfer IDP (HoTIDP) is accessible at http://hotidp.leloir.org.ar.
publishDate 2023
dc.date.none.fl_str_mv 2023-05
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/228717
Martinez Perez, Elizabeth; Pajkos, Mátyás; Tosatto, Silvio C. E.; Gibson, Toby James; Dosztanyi, Zsuzsanna; et al.; Pipeline for transferring annotations between proteins beyond globular domains; John Wiley & Sons; Protein Science; 32; 7; 5-2023; 1-21
0961-8368
CONICET Digital
CONICET
url http://hdl.handle.net/11336/228717
identifier_str_mv Martinez Perez, Elizabeth; Pajkos, Mátyás; Tosatto, Silvio C. E.; Gibson, Toby James; Dosztanyi, Zsuzsanna; et al.; Pipeline for transferring annotations between proteins beyond globular domains; John Wiley & Sons; Protein Science; 32; 7; 5-2023; 1-21
0961-8368
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://onlinelibrary.wiley.com/doi/10.1002/pro.4655
info:eu-repo/semantics/altIdentifier/doi/10.1002/pro.4655
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv John Wiley & Sons
publisher.none.fl_str_mv John Wiley & Sons
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844613577786261504
score 13.070432