Size and structure of the sequence space of repeat proteins
- Autores
- Marchi, Jacopo; Galpern, Ezequiel Alejandro; Espada, Rocio; Ferreiro, Diego; Walczak, Aleksandra M.; Mora, Thierry
- Año de publicación
- 2019
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- The coding space of protein sequences is shaped by evolutionary constraints set by requirements of function and stability. We show that the coding space of a given protein family-the total number of sequences in that family-can be estimated using models of maximum entropy trained on multiple sequence alignments of naturally occuring amino acid sequences. We analyzed and calculated the size of three abundant repeat proteins families, whose members are large proteins made of many repetitions of conserved portions of *30 amino acids. While amino acid conservation at each position of the alignment explains most of the reduction of diversity relative to completely random sequences, we found that correlations between amino acid usage at different positions significantly impact that diversity. We quantified the impact of different types of correlations, functional and evolutionary, on sequence diversity. Analysis of the detailed structure of the coding space of the families revealed a rugged landscape, with many local energy minima of varying sizes with a hierarchical structure, reminiscent of fustrated energy landscapes of spin glass in physics. This clustered structure indicates a multiplicity of subtypes within each family, and suggests new strategies for protein design.
Fil: Marchi, Jacopo. Ecole Normale Supérieure; Francia
Fil: Galpern, Ezequiel Alejandro. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; Argentina
Fil: Espada, Rocio. PSL University; Francia
Fil: Ferreiro, Diego. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; Argentina
Fil: Walczak, Aleksandra M.. Ecole Normale Supérieure; Francia
Fil: Mora, Thierry. Ecole Normale Supérieure; Francia - Materia
-
protein folding
protein design
protein evolution - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/123547
Ver los metadatos del registro completo
id |
CONICETDig_4f6f0ce0eb5ed1430d77ba6e1933b271 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/123547 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Size and structure of the sequence space of repeat proteinsMarchi, JacopoGalpern, Ezequiel AlejandroEspada, RocioFerreiro, DiegoWalczak, Aleksandra M.Mora, Thierryprotein foldingprotein designprotein evolutionhttps://purl.org/becyt/ford/1.6https://purl.org/becyt/ford/1The coding space of protein sequences is shaped by evolutionary constraints set by requirements of function and stability. We show that the coding space of a given protein family-the total number of sequences in that family-can be estimated using models of maximum entropy trained on multiple sequence alignments of naturally occuring amino acid sequences. We analyzed and calculated the size of three abundant repeat proteins families, whose members are large proteins made of many repetitions of conserved portions of *30 amino acids. While amino acid conservation at each position of the alignment explains most of the reduction of diversity relative to completely random sequences, we found that correlations between amino acid usage at different positions significantly impact that diversity. We quantified the impact of different types of correlations, functional and evolutionary, on sequence diversity. Analysis of the detailed structure of the coding space of the families revealed a rugged landscape, with many local energy minima of varying sizes with a hierarchical structure, reminiscent of fustrated energy landscapes of spin glass in physics. This clustered structure indicates a multiplicity of subtypes within each family, and suggests new strategies for protein design.Fil: Marchi, Jacopo. Ecole Normale Supérieure; FranciaFil: Galpern, Ezequiel Alejandro. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Espada, Rocio. PSL University; FranciaFil: Ferreiro, Diego. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Walczak, Aleksandra M.. Ecole Normale Supérieure; FranciaFil: Mora, Thierry. Ecole Normale Supérieure; FranciaPublic Library of Science2019-08info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/123547Marchi, Jacopo; Galpern, Ezequiel Alejandro; Espada, Rocio; Ferreiro, Diego; Walczak, Aleksandra M.; et al.; Size and structure of the sequence space of repeat proteins; Public Library of Science; Plos Computational Biology; 15; 8; 8-2019; 1-231553-734XCONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1371/journal.pcbi.1007282info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:57:32Zoai:ri.conicet.gov.ar:11336/123547instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:57:32.748CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Size and structure of the sequence space of repeat proteins |
title |
Size and structure of the sequence space of repeat proteins |
spellingShingle |
Size and structure of the sequence space of repeat proteins Marchi, Jacopo protein folding protein design protein evolution |
title_short |
Size and structure of the sequence space of repeat proteins |
title_full |
Size and structure of the sequence space of repeat proteins |
title_fullStr |
Size and structure of the sequence space of repeat proteins |
title_full_unstemmed |
Size and structure of the sequence space of repeat proteins |
title_sort |
Size and structure of the sequence space of repeat proteins |
dc.creator.none.fl_str_mv |
Marchi, Jacopo Galpern, Ezequiel Alejandro Espada, Rocio Ferreiro, Diego Walczak, Aleksandra M. Mora, Thierry |
author |
Marchi, Jacopo |
author_facet |
Marchi, Jacopo Galpern, Ezequiel Alejandro Espada, Rocio Ferreiro, Diego Walczak, Aleksandra M. Mora, Thierry |
author_role |
author |
author2 |
Galpern, Ezequiel Alejandro Espada, Rocio Ferreiro, Diego Walczak, Aleksandra M. Mora, Thierry |
author2_role |
author author author author author |
dc.subject.none.fl_str_mv |
protein folding protein design protein evolution |
topic |
protein folding protein design protein evolution |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.6 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
The coding space of protein sequences is shaped by evolutionary constraints set by requirements of function and stability. We show that the coding space of a given protein family-the total number of sequences in that family-can be estimated using models of maximum entropy trained on multiple sequence alignments of naturally occuring amino acid sequences. We analyzed and calculated the size of three abundant repeat proteins families, whose members are large proteins made of many repetitions of conserved portions of *30 amino acids. While amino acid conservation at each position of the alignment explains most of the reduction of diversity relative to completely random sequences, we found that correlations between amino acid usage at different positions significantly impact that diversity. We quantified the impact of different types of correlations, functional and evolutionary, on sequence diversity. Analysis of the detailed structure of the coding space of the families revealed a rugged landscape, with many local energy minima of varying sizes with a hierarchical structure, reminiscent of fustrated energy landscapes of spin glass in physics. This clustered structure indicates a multiplicity of subtypes within each family, and suggests new strategies for protein design. Fil: Marchi, Jacopo. Ecole Normale Supérieure; Francia Fil: Galpern, Ezequiel Alejandro. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; Argentina Fil: Espada, Rocio. PSL University; Francia Fil: Ferreiro, Diego. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; Argentina Fil: Walczak, Aleksandra M.. Ecole Normale Supérieure; Francia Fil: Mora, Thierry. Ecole Normale Supérieure; Francia |
description |
The coding space of protein sequences is shaped by evolutionary constraints set by requirements of function and stability. We show that the coding space of a given protein family-the total number of sequences in that family-can be estimated using models of maximum entropy trained on multiple sequence alignments of naturally occuring amino acid sequences. We analyzed and calculated the size of three abundant repeat proteins families, whose members are large proteins made of many repetitions of conserved portions of *30 amino acids. While amino acid conservation at each position of the alignment explains most of the reduction of diversity relative to completely random sequences, we found that correlations between amino acid usage at different positions significantly impact that diversity. We quantified the impact of different types of correlations, functional and evolutionary, on sequence diversity. Analysis of the detailed structure of the coding space of the families revealed a rugged landscape, with many local energy minima of varying sizes with a hierarchical structure, reminiscent of fustrated energy landscapes of spin glass in physics. This clustered structure indicates a multiplicity of subtypes within each family, and suggests new strategies for protein design. |
publishDate |
2019 |
dc.date.none.fl_str_mv |
2019-08 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/123547 Marchi, Jacopo; Galpern, Ezequiel Alejandro; Espada, Rocio; Ferreiro, Diego; Walczak, Aleksandra M.; et al.; Size and structure of the sequence space of repeat proteins; Public Library of Science; Plos Computational Biology; 15; 8; 8-2019; 1-23 1553-734X CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/123547 |
identifier_str_mv |
Marchi, Jacopo; Galpern, Ezequiel Alejandro; Espada, Rocio; Ferreiro, Diego; Walczak, Aleksandra M.; et al.; Size and structure of the sequence space of repeat proteins; Public Library of Science; Plos Computational Biology; 15; 8; 8-2019; 1-23 1553-734X CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/doi/10.1371/journal.pcbi.1007282 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Public Library of Science |
publisher.none.fl_str_mv |
Public Library of Science |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844613721316392960 |
score |
13.070432 |