Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case
- Autores
- Espariz, Martin; Zuljan, Federico Alberto; Esteban, Luis; Magni, Christian
- Año de publicación
- 2016
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Bacillus pumilus group strains have been studied due their agronomic, biotechnological or pharmaceutical potential. Classifying strains of this taxonomic group at species level is a challenging procedure since it is composed of seven species that share among them over 99.5% of 16S rRNA gene identity. In this study, first, a whole-genome in silico approach was used to accurately demarcate B. pumilus group strains, as a case of highly phylogenetically related taxa, at the species level. In order to achieve that and consequently to validate or correct taxonomic identities of genomes in public databases, an average nucleotide identity correlation, a core-based phylogenomic and a gene function repertory analyses were performed. Eventually, more than 50% such genomes were found to be misclassified. Hierarchical clustering of gene functional repertoires was also used to infer ecotypes among B. pumilus group species. Furthermore, for the first time the machine-learning algorithm Random Forest was used to rank genes in order of their importance for species classification. We found that ybbP, a gene involved in the synthesis of cyclic di-AMP, was the most important gene for accurately predicting species identity among B. pumilus group strains. Finally, principal component analysis was used to classify strains based on the distances between their ybbP genes. The methodologies described could be utilized more broadly to identify other highly phylogenetically related species in metagenomic or epidemiological assessments.
Fil: Espariz, Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; Argentina
Fil: Zuljan, Federico Alberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; Argentina
Fil: Esteban, Luis. Universidad Nacional de Rosario. Facultad de Ciencias Médicas; Argentina
Fil: Magni, Christian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; Argentina - Materia
-
B. Pumilus
Randomforests
Taxonomic Resulution - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
.jpg)
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/50748
Ver los metadatos del registro completo
| id |
CONICETDig_6278f38f970732d784d5763f3902d109 |
|---|---|
| oai_identifier_str |
oai:ri.conicet.gov.ar:11336/50748 |
| network_acronym_str |
CONICETDig |
| repository_id_str |
3498 |
| network_name_str |
CONICET Digital (CONICET) |
| spelling |
Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group caseEspariz, MartinZuljan, Federico AlbertoEsteban, LuisMagni, ChristianB. PumilusRandomforestsTaxonomic Resulutionhttps://purl.org/becyt/ford/1.6https://purl.org/becyt/ford/1Bacillus pumilus group strains have been studied due their agronomic, biotechnological or pharmaceutical potential. Classifying strains of this taxonomic group at species level is a challenging procedure since it is composed of seven species that share among them over 99.5% of 16S rRNA gene identity. In this study, first, a whole-genome in silico approach was used to accurately demarcate B. pumilus group strains, as a case of highly phylogenetically related taxa, at the species level. In order to achieve that and consequently to validate or correct taxonomic identities of genomes in public databases, an average nucleotide identity correlation, a core-based phylogenomic and a gene function repertory analyses were performed. Eventually, more than 50% such genomes were found to be misclassified. Hierarchical clustering of gene functional repertoires was also used to infer ecotypes among B. pumilus group species. Furthermore, for the first time the machine-learning algorithm Random Forest was used to rank genes in order of their importance for species classification. We found that ybbP, a gene involved in the synthesis of cyclic di-AMP, was the most important gene for accurately predicting species identity among B. pumilus group strains. Finally, principal component analysis was used to classify strains based on the distances between their ybbP genes. The methodologies described could be utilized more broadly to identify other highly phylogenetically related species in metagenomic or epidemiological assessments.Fil: Espariz, Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; ArgentinaFil: Zuljan, Federico Alberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; ArgentinaFil: Esteban, Luis. Universidad Nacional de Rosario. Facultad de Ciencias Médicas; ArgentinaFil: Magni, Christian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; ArgentinaPublic Library of Science2016-09info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/50748Espariz, Martin; Zuljan, Federico Alberto; Esteban, Luis; Magni, Christian; Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case; Public Library of Science; Plos One; 11; 9; 9-2016; 1-17; e01630981932-6203CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1371/journal.pone.0163098info:eu-repo/semantics/altIdentifier/url/http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0163098info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-10-22T11:30:41Zoai:ri.conicet.gov.ar:11336/50748instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-10-22 11:30:41.915CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
| dc.title.none.fl_str_mv |
Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case |
| title |
Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case |
| spellingShingle |
Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case Espariz, Martin B. Pumilus Randomforests Taxonomic Resulution |
| title_short |
Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case |
| title_full |
Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case |
| title_fullStr |
Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case |
| title_full_unstemmed |
Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case |
| title_sort |
Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case |
| dc.creator.none.fl_str_mv |
Espariz, Martin Zuljan, Federico Alberto Esteban, Luis Magni, Christian |
| author |
Espariz, Martin |
| author_facet |
Espariz, Martin Zuljan, Federico Alberto Esteban, Luis Magni, Christian |
| author_role |
author |
| author2 |
Zuljan, Federico Alberto Esteban, Luis Magni, Christian |
| author2_role |
author author author |
| dc.subject.none.fl_str_mv |
B. Pumilus Randomforests Taxonomic Resulution |
| topic |
B. Pumilus Randomforests Taxonomic Resulution |
| purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.6 https://purl.org/becyt/ford/1 |
| dc.description.none.fl_txt_mv |
Bacillus pumilus group strains have been studied due their agronomic, biotechnological or pharmaceutical potential. Classifying strains of this taxonomic group at species level is a challenging procedure since it is composed of seven species that share among them over 99.5% of 16S rRNA gene identity. In this study, first, a whole-genome in silico approach was used to accurately demarcate B. pumilus group strains, as a case of highly phylogenetically related taxa, at the species level. In order to achieve that and consequently to validate or correct taxonomic identities of genomes in public databases, an average nucleotide identity correlation, a core-based phylogenomic and a gene function repertory analyses were performed. Eventually, more than 50% such genomes were found to be misclassified. Hierarchical clustering of gene functional repertoires was also used to infer ecotypes among B. pumilus group species. Furthermore, for the first time the machine-learning algorithm Random Forest was used to rank genes in order of their importance for species classification. We found that ybbP, a gene involved in the synthesis of cyclic di-AMP, was the most important gene for accurately predicting species identity among B. pumilus group strains. Finally, principal component analysis was used to classify strains based on the distances between their ybbP genes. The methodologies described could be utilized more broadly to identify other highly phylogenetically related species in metagenomic or epidemiological assessments. Fil: Espariz, Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; Argentina Fil: Zuljan, Federico Alberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; Argentina Fil: Esteban, Luis. Universidad Nacional de Rosario. Facultad de Ciencias Médicas; Argentina Fil: Magni, Christian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; Argentina |
| description |
Bacillus pumilus group strains have been studied due their agronomic, biotechnological or pharmaceutical potential. Classifying strains of this taxonomic group at species level is a challenging procedure since it is composed of seven species that share among them over 99.5% of 16S rRNA gene identity. In this study, first, a whole-genome in silico approach was used to accurately demarcate B. pumilus group strains, as a case of highly phylogenetically related taxa, at the species level. In order to achieve that and consequently to validate or correct taxonomic identities of genomes in public databases, an average nucleotide identity correlation, a core-based phylogenomic and a gene function repertory analyses were performed. Eventually, more than 50% such genomes were found to be misclassified. Hierarchical clustering of gene functional repertoires was also used to infer ecotypes among B. pumilus group species. Furthermore, for the first time the machine-learning algorithm Random Forest was used to rank genes in order of their importance for species classification. We found that ybbP, a gene involved in the synthesis of cyclic di-AMP, was the most important gene for accurately predicting species identity among B. pumilus group strains. Finally, principal component analysis was used to classify strains based on the distances between their ybbP genes. The methodologies described could be utilized more broadly to identify other highly phylogenetically related species in metagenomic or epidemiological assessments. |
| publishDate |
2016 |
| dc.date.none.fl_str_mv |
2016-09 |
| dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
| format |
article |
| status_str |
publishedVersion |
| dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/50748 Espariz, Martin; Zuljan, Federico Alberto; Esteban, Luis; Magni, Christian; Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case; Public Library of Science; Plos One; 11; 9; 9-2016; 1-17; e0163098 1932-6203 CONICET Digital CONICET |
| url |
http://hdl.handle.net/11336/50748 |
| identifier_str_mv |
Espariz, Martin; Zuljan, Federico Alberto; Esteban, Luis; Magni, Christian; Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case; Public Library of Science; Plos One; 11; 9; 9-2016; 1-17; e0163098 1932-6203 CONICET Digital CONICET |
| dc.language.none.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/doi/10.1371/journal.pone.0163098 info:eu-repo/semantics/altIdentifier/url/http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0163098 |
| dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
| eu_rights_str_mv |
openAccess |
| rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
| dc.format.none.fl_str_mv |
application/pdf application/pdf application/pdf application/pdf |
| dc.publisher.none.fl_str_mv |
Public Library of Science |
| publisher.none.fl_str_mv |
Public Library of Science |
| dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
| reponame_str |
CONICET Digital (CONICET) |
| collection |
CONICET Digital (CONICET) |
| instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
| repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
| repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
| _version_ |
1846781903628140544 |
| score |
12.982451 |