Size and structure of the sequence space of repeat proteins

Autores
Marchi, Jacopo; Galpern, Ezequiel Alejandro; Espada, Rocio; Ferreiro, Diego; Walczak, Aleksandra M.; Mora, Thierry
Año de publicación
2019
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
The coding space of protein sequences is shaped by evolutionary constraints set by requirements of function and stability. We show that the coding space of a given protein family-the total number of sequences in that family-can be estimated using models of maximum entropy trained on multiple sequence alignments of naturally occuring amino acid sequences. We analyzed and calculated the size of three abundant repeat proteins families, whose members are large proteins made of many repetitions of conserved portions of *30 amino acids. While amino acid conservation at each position of the alignment explains most of the reduction of diversity relative to completely random sequences, we found that correlations between amino acid usage at different positions significantly impact that diversity. We quantified the impact of different types of correlations, functional and evolutionary, on sequence diversity. Analysis of the detailed structure of the coding space of the families revealed a rugged landscape, with many local energy minima of varying sizes with a hierarchical structure, reminiscent of fustrated energy landscapes of spin glass in physics. This clustered structure indicates a multiplicity of subtypes within each family, and suggests new strategies for protein design.
Fil: Marchi, Jacopo. Ecole Normale Supérieure; Francia
Fil: Galpern, Ezequiel Alejandro. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; Argentina
Fil: Espada, Rocio. PSL University; Francia
Fil: Ferreiro, Diego. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; Argentina
Fil: Walczak, Aleksandra M.. Ecole Normale Supérieure; Francia
Fil: Mora, Thierry. Ecole Normale Supérieure; Francia
Materia
protein folding
protein design
protein evolution
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/123547

id CONICETDig_4f6f0ce0eb5ed1430d77ba6e1933b271
oai_identifier_str oai:ri.conicet.gov.ar:11336/123547
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Size and structure of the sequence space of repeat proteinsMarchi, JacopoGalpern, Ezequiel AlejandroEspada, RocioFerreiro, DiegoWalczak, Aleksandra M.Mora, Thierryprotein foldingprotein designprotein evolutionhttps://purl.org/becyt/ford/1.6https://purl.org/becyt/ford/1The coding space of protein sequences is shaped by evolutionary constraints set by requirements of function and stability. We show that the coding space of a given protein family-the total number of sequences in that family-can be estimated using models of maximum entropy trained on multiple sequence alignments of naturally occuring amino acid sequences. We analyzed and calculated the size of three abundant repeat proteins families, whose members are large proteins made of many repetitions of conserved portions of *30 amino acids. While amino acid conservation at each position of the alignment explains most of the reduction of diversity relative to completely random sequences, we found that correlations between amino acid usage at different positions significantly impact that diversity. We quantified the impact of different types of correlations, functional and evolutionary, on sequence diversity. Analysis of the detailed structure of the coding space of the families revealed a rugged landscape, with many local energy minima of varying sizes with a hierarchical structure, reminiscent of fustrated energy landscapes of spin glass in physics. This clustered structure indicates a multiplicity of subtypes within each family, and suggests new strategies for protein design.Fil: Marchi, Jacopo. Ecole Normale Supérieure; FranciaFil: Galpern, Ezequiel Alejandro. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Espada, Rocio. PSL University; FranciaFil: Ferreiro, Diego. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Walczak, Aleksandra M.. Ecole Normale Supérieure; FranciaFil: Mora, Thierry. Ecole Normale Supérieure; FranciaPublic Library of Science2019-08info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/123547Marchi, Jacopo; Galpern, Ezequiel Alejandro; Espada, Rocio; Ferreiro, Diego; Walczak, Aleksandra M.; et al.; Size and structure of the sequence space of repeat proteins; Public Library of Science; Plos Computational Biology; 15; 8; 8-2019; 1-231553-734XCONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1371/journal.pcbi.1007282info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:57:32Zoai:ri.conicet.gov.ar:11336/123547instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:57:32.748CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Size and structure of the sequence space of repeat proteins
title Size and structure of the sequence space of repeat proteins
spellingShingle Size and structure of the sequence space of repeat proteins
Marchi, Jacopo
protein folding
protein design
protein evolution
title_short Size and structure of the sequence space of repeat proteins
title_full Size and structure of the sequence space of repeat proteins
title_fullStr Size and structure of the sequence space of repeat proteins
title_full_unstemmed Size and structure of the sequence space of repeat proteins
title_sort Size and structure of the sequence space of repeat proteins
dc.creator.none.fl_str_mv Marchi, Jacopo
Galpern, Ezequiel Alejandro
Espada, Rocio
Ferreiro, Diego
Walczak, Aleksandra M.
Mora, Thierry
author Marchi, Jacopo
author_facet Marchi, Jacopo
Galpern, Ezequiel Alejandro
Espada, Rocio
Ferreiro, Diego
Walczak, Aleksandra M.
Mora, Thierry
author_role author
author2 Galpern, Ezequiel Alejandro
Espada, Rocio
Ferreiro, Diego
Walczak, Aleksandra M.
Mora, Thierry
author2_role author
author
author
author
author
dc.subject.none.fl_str_mv protein folding
protein design
protein evolution
topic protein folding
protein design
protein evolution
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.6
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv The coding space of protein sequences is shaped by evolutionary constraints set by requirements of function and stability. We show that the coding space of a given protein family-the total number of sequences in that family-can be estimated using models of maximum entropy trained on multiple sequence alignments of naturally occuring amino acid sequences. We analyzed and calculated the size of three abundant repeat proteins families, whose members are large proteins made of many repetitions of conserved portions of *30 amino acids. While amino acid conservation at each position of the alignment explains most of the reduction of diversity relative to completely random sequences, we found that correlations between amino acid usage at different positions significantly impact that diversity. We quantified the impact of different types of correlations, functional and evolutionary, on sequence diversity. Analysis of the detailed structure of the coding space of the families revealed a rugged landscape, with many local energy minima of varying sizes with a hierarchical structure, reminiscent of fustrated energy landscapes of spin glass in physics. This clustered structure indicates a multiplicity of subtypes within each family, and suggests new strategies for protein design.
Fil: Marchi, Jacopo. Ecole Normale Supérieure; Francia
Fil: Galpern, Ezequiel Alejandro. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; Argentina
Fil: Espada, Rocio. PSL University; Francia
Fil: Ferreiro, Diego. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; Argentina
Fil: Walczak, Aleksandra M.. Ecole Normale Supérieure; Francia
Fil: Mora, Thierry. Ecole Normale Supérieure; Francia
description The coding space of protein sequences is shaped by evolutionary constraints set by requirements of function and stability. We show that the coding space of a given protein family-the total number of sequences in that family-can be estimated using models of maximum entropy trained on multiple sequence alignments of naturally occuring amino acid sequences. We analyzed and calculated the size of three abundant repeat proteins families, whose members are large proteins made of many repetitions of conserved portions of *30 amino acids. While amino acid conservation at each position of the alignment explains most of the reduction of diversity relative to completely random sequences, we found that correlations between amino acid usage at different positions significantly impact that diversity. We quantified the impact of different types of correlations, functional and evolutionary, on sequence diversity. Analysis of the detailed structure of the coding space of the families revealed a rugged landscape, with many local energy minima of varying sizes with a hierarchical structure, reminiscent of fustrated energy landscapes of spin glass in physics. This clustered structure indicates a multiplicity of subtypes within each family, and suggests new strategies for protein design.
publishDate 2019
dc.date.none.fl_str_mv 2019-08
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/123547
Marchi, Jacopo; Galpern, Ezequiel Alejandro; Espada, Rocio; Ferreiro, Diego; Walczak, Aleksandra M.; et al.; Size and structure of the sequence space of repeat proteins; Public Library of Science; Plos Computational Biology; 15; 8; 8-2019; 1-23
1553-734X
CONICET Digital
CONICET
url http://hdl.handle.net/11336/123547
identifier_str_mv Marchi, Jacopo; Galpern, Ezequiel Alejandro; Espada, Rocio; Ferreiro, Diego; Walczak, Aleksandra M.; et al.; Size and structure of the sequence space of repeat proteins; Public Library of Science; Plos Computational Biology; 15; 8; 8-2019; 1-23
1553-734X
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/10.1371/journal.pcbi.1007282
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Public Library of Science
publisher.none.fl_str_mv Public Library of Science
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844613721316392960
score 13.070432