Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case

Autores
Espariz, Martin; Zuljan, Federico Alberto; Esteban, Luis; Magni, Christian
Año de publicación
2016
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Bacillus pumilus group strains have been studied due their agronomic, biotechnological or pharmaceutical potential. Classifying strains of this taxonomic group at species level is a challenging procedure since it is composed of seven species that share among them over 99.5% of 16S rRNA gene identity. In this study, first, a whole-genome in silico approach was used to accurately demarcate B. pumilus group strains, as a case of highly phylogenetically related taxa, at the species level. In order to achieve that and consequently to validate or correct taxonomic identities of genomes in public databases, an average nucleotide identity correlation, a core-based phylogenomic and a gene function repertory analyses were performed. Eventually, more than 50% such genomes were found to be misclassified. Hierarchical clustering of gene functional repertoires was also used to infer ecotypes among B. pumilus group species. Furthermore, for the first time the machine-learning algorithm Random Forest was used to rank genes in order of their importance for species classification. We found that ybbP, a gene involved in the synthesis of cyclic di-AMP, was the most important gene for accurately predicting species identity among B. pumilus group strains. Finally, principal component analysis was used to classify strains based on the distances between their ybbP genes. The methodologies described could be utilized more broadly to identify other highly phylogenetically related species in metagenomic or epidemiological assessments.
Fil: Espariz, Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; Argentina
Fil: Zuljan, Federico Alberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; Argentina
Fil: Esteban, Luis. Universidad Nacional de Rosario. Facultad de Ciencias Médicas; Argentina
Fil: Magni, Christian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; Argentina
Materia
B. Pumilus
Randomforests
Taxonomic Resulution
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/50748

id CONICETDig_6278f38f970732d784d5763f3902d109
oai_identifier_str oai:ri.conicet.gov.ar:11336/50748
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group caseEspariz, MartinZuljan, Federico AlbertoEsteban, LuisMagni, ChristianB. PumilusRandomforestsTaxonomic Resulutionhttps://purl.org/becyt/ford/1.6https://purl.org/becyt/ford/1Bacillus pumilus group strains have been studied due their agronomic, biotechnological or pharmaceutical potential. Classifying strains of this taxonomic group at species level is a challenging procedure since it is composed of seven species that share among them over 99.5% of 16S rRNA gene identity. In this study, first, a whole-genome in silico approach was used to accurately demarcate B. pumilus group strains, as a case of highly phylogenetically related taxa, at the species level. In order to achieve that and consequently to validate or correct taxonomic identities of genomes in public databases, an average nucleotide identity correlation, a core-based phylogenomic and a gene function repertory analyses were performed. Eventually, more than 50% such genomes were found to be misclassified. Hierarchical clustering of gene functional repertoires was also used to infer ecotypes among B. pumilus group species. Furthermore, for the first time the machine-learning algorithm Random Forest was used to rank genes in order of their importance for species classification. We found that ybbP, a gene involved in the synthesis of cyclic di-AMP, was the most important gene for accurately predicting species identity among B. pumilus group strains. Finally, principal component analysis was used to classify strains based on the distances between their ybbP genes. The methodologies described could be utilized more broadly to identify other highly phylogenetically related species in metagenomic or epidemiological assessments.Fil: Espariz, Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; ArgentinaFil: Zuljan, Federico Alberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; ArgentinaFil: Esteban, Luis. Universidad Nacional de Rosario. Facultad de Ciencias Médicas; ArgentinaFil: Magni, Christian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; ArgentinaPublic Library of Science2016-09info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/50748Espariz, Martin; Zuljan, Federico Alberto; Esteban, Luis; Magni, Christian; Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case; Public Library of Science; Plos One; 11; 9; 9-2016; 1-17; e01630981932-6203CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1371/journal.pone.0163098info:eu-repo/semantics/altIdentifier/url/http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0163098info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-10-22T11:30:41Zoai:ri.conicet.gov.ar:11336/50748instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-10-22 11:30:41.915CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case
title Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case
spellingShingle Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case
Espariz, Martin
B. Pumilus
Randomforests
Taxonomic Resulution
title_short Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case
title_full Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case
title_fullStr Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case
title_full_unstemmed Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case
title_sort Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case
dc.creator.none.fl_str_mv Espariz, Martin
Zuljan, Federico Alberto
Esteban, Luis
Magni, Christian
author Espariz, Martin
author_facet Espariz, Martin
Zuljan, Federico Alberto
Esteban, Luis
Magni, Christian
author_role author
author2 Zuljan, Federico Alberto
Esteban, Luis
Magni, Christian
author2_role author
author
author
dc.subject.none.fl_str_mv B. Pumilus
Randomforests
Taxonomic Resulution
topic B. Pumilus
Randomforests
Taxonomic Resulution
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.6
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Bacillus pumilus group strains have been studied due their agronomic, biotechnological or pharmaceutical potential. Classifying strains of this taxonomic group at species level is a challenging procedure since it is composed of seven species that share among them over 99.5% of 16S rRNA gene identity. In this study, first, a whole-genome in silico approach was used to accurately demarcate B. pumilus group strains, as a case of highly phylogenetically related taxa, at the species level. In order to achieve that and consequently to validate or correct taxonomic identities of genomes in public databases, an average nucleotide identity correlation, a core-based phylogenomic and a gene function repertory analyses were performed. Eventually, more than 50% such genomes were found to be misclassified. Hierarchical clustering of gene functional repertoires was also used to infer ecotypes among B. pumilus group species. Furthermore, for the first time the machine-learning algorithm Random Forest was used to rank genes in order of their importance for species classification. We found that ybbP, a gene involved in the synthesis of cyclic di-AMP, was the most important gene for accurately predicting species identity among B. pumilus group strains. Finally, principal component analysis was used to classify strains based on the distances between their ybbP genes. The methodologies described could be utilized more broadly to identify other highly phylogenetically related species in metagenomic or epidemiological assessments.
Fil: Espariz, Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; Argentina
Fil: Zuljan, Federico Alberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; Argentina
Fil: Esteban, Luis. Universidad Nacional de Rosario. Facultad de Ciencias Médicas; Argentina
Fil: Magni, Christian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; Argentina
description Bacillus pumilus group strains have been studied due their agronomic, biotechnological or pharmaceutical potential. Classifying strains of this taxonomic group at species level is a challenging procedure since it is composed of seven species that share among them over 99.5% of 16S rRNA gene identity. In this study, first, a whole-genome in silico approach was used to accurately demarcate B. pumilus group strains, as a case of highly phylogenetically related taxa, at the species level. In order to achieve that and consequently to validate or correct taxonomic identities of genomes in public databases, an average nucleotide identity correlation, a core-based phylogenomic and a gene function repertory analyses were performed. Eventually, more than 50% such genomes were found to be misclassified. Hierarchical clustering of gene functional repertoires was also used to infer ecotypes among B. pumilus group species. Furthermore, for the first time the machine-learning algorithm Random Forest was used to rank genes in order of their importance for species classification. We found that ybbP, a gene involved in the synthesis of cyclic di-AMP, was the most important gene for accurately predicting species identity among B. pumilus group strains. Finally, principal component analysis was used to classify strains based on the distances between their ybbP genes. The methodologies described could be utilized more broadly to identify other highly phylogenetically related species in metagenomic or epidemiological assessments.
publishDate 2016
dc.date.none.fl_str_mv 2016-09
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/50748
Espariz, Martin; Zuljan, Federico Alberto; Esteban, Luis; Magni, Christian; Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case; Public Library of Science; Plos One; 11; 9; 9-2016; 1-17; e0163098
1932-6203
CONICET Digital
CONICET
url http://hdl.handle.net/11336/50748
identifier_str_mv Espariz, Martin; Zuljan, Federico Alberto; Esteban, Luis; Magni, Christian; Taxonomic identity resolution of highly phylogenetically related strains and selection of phylogenetic markers by using genome-scale methods: The bacillus pumilus group case; Public Library of Science; Plos One; 11; 9; 9-2016; 1-17; e0163098
1932-6203
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/10.1371/journal.pone.0163098
info:eu-repo/semantics/altIdentifier/url/http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0163098
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv Public Library of Science
publisher.none.fl_str_mv Public Library of Science
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1846781903628140544
score 12.982451