HoSeIn: A Workflow for Integrating Various Homology Search Results from Metagenomic and Metatranscriptomic Sequence Datasets

Autores
Rozadilla, Gastón; Moreiras Clemente, Jorgelina; McCarthy, Christina Beryl
Año de publicación
2020
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Data generated by metagenomic and metatranscriptomic experiments is both enormous and inherently noisy. When using taxonomy-dependent alignment-based methods to classify and label reads, the first step consists in performing homology searches against sequence databases. To obtain the most information from the samples, nucleotide sequences are usually compared to various databases (nucleotide and protein) using local sequence aligners such as BLASTN and BLASTX. Nevertheless, the analysis and integration of these results can be problematic because the outputs from these searches usually show inconsistencies, which can be notorious when working with RNA-seq. Moreover, and to the best of our knowledge, existing tools do not criss-cross and integrate information from the different homology searches, but provide the results of each analysis separately. We developed the HoSeIn workflow to intersect the information from these homology searches, and then determine the taxonomic and functional profile of the sample using this integrated information. The workflow is based on the assumption that the sequences that correspond to a certain taxon are composed of: sequences that were assigned to the same taxon by both homology searches; sequences that were assigned to that taxon by one of the homology searches but returned no hits in the other one.
Centro Regional de Estudios Genómicos
Materia
Ciencias Exactas
Metagenomics
Metatranscriptomics
Next Generation Sequencing
Homology Search
Taxonomic Profile
Functional Profile
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/132750

id SEDICI_65b85254c719be78d4a3b380b0ae5aee
oai_identifier_str oai:sedici.unlp.edu.ar:10915/132750
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling HoSeIn: A Workflow for Integrating Various Homology Search Results from Metagenomic and Metatranscriptomic Sequence DatasetsRozadilla, GastónMoreiras Clemente, JorgelinaMcCarthy, Christina BerylCiencias ExactasMetagenomicsMetatranscriptomicsNext Generation SequencingHomology SearchTaxonomic ProfileFunctional ProfileData generated by metagenomic and metatranscriptomic experiments is both enormous and inherently noisy. When using taxonomy-dependent alignment-based methods to classify and label reads, the first step consists in performing homology searches against sequence databases. To obtain the most information from the samples, nucleotide sequences are usually compared to various databases (nucleotide and protein) using local sequence aligners such as BLASTN and BLASTX. Nevertheless, the analysis and integration of these results can be problematic because the outputs from these searches usually show inconsistencies, which can be notorious when working with RNA-seq. Moreover, and to the best of our knowledge, existing tools do not criss-cross and integrate information from the different homology searches, but provide the results of each analysis separately. We developed the HoSeIn workflow to intersect the information from these homology searches, and then determine the taxonomic and functional profile of the sample using this integrated information. The workflow is based on the assumption that the sequences that correspond to a certain taxon are composed of: sequences that were assigned to the same taxon by both homology searches; sequences that were assigned to that taxon by one of the homology searches but returned no hits in the other one.Centro Regional de Estudios Genómicos2020-07-20info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionArticulohttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/132750enginfo:eu-repo/semantics/altIdentifier/issn/2331-8325info:eu-repo/semantics/altIdentifier/doi/10.21769/bioprotoc.3679info:eu-repo/semantics/altIdentifier/pmid/33659350info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by/4.0/Creative Commons Attribution 4.0 International (CC BY 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2026-04-08T10:21:25Zoai:sedici.unlp.edu.ar:10915/132750Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292026-04-08 10:21:26.129SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv HoSeIn: A Workflow for Integrating Various Homology Search Results from Metagenomic and Metatranscriptomic Sequence Datasets
title HoSeIn: A Workflow for Integrating Various Homology Search Results from Metagenomic and Metatranscriptomic Sequence Datasets
spellingShingle HoSeIn: A Workflow for Integrating Various Homology Search Results from Metagenomic and Metatranscriptomic Sequence Datasets
Rozadilla, Gastón
Ciencias Exactas
Metagenomics
Metatranscriptomics
Next Generation Sequencing
Homology Search
Taxonomic Profile
Functional Profile
title_short HoSeIn: A Workflow for Integrating Various Homology Search Results from Metagenomic and Metatranscriptomic Sequence Datasets
title_full HoSeIn: A Workflow for Integrating Various Homology Search Results from Metagenomic and Metatranscriptomic Sequence Datasets
title_fullStr HoSeIn: A Workflow for Integrating Various Homology Search Results from Metagenomic and Metatranscriptomic Sequence Datasets
title_full_unstemmed HoSeIn: A Workflow for Integrating Various Homology Search Results from Metagenomic and Metatranscriptomic Sequence Datasets
title_sort HoSeIn: A Workflow for Integrating Various Homology Search Results from Metagenomic and Metatranscriptomic Sequence Datasets
dc.creator.none.fl_str_mv Rozadilla, Gastón
Moreiras Clemente, Jorgelina
McCarthy, Christina Beryl
author Rozadilla, Gastón
author_facet Rozadilla, Gastón
Moreiras Clemente, Jorgelina
McCarthy, Christina Beryl
author_role author
author2 Moreiras Clemente, Jorgelina
McCarthy, Christina Beryl
author2_role author
author
dc.subject.none.fl_str_mv Ciencias Exactas
Metagenomics
Metatranscriptomics
Next Generation Sequencing
Homology Search
Taxonomic Profile
Functional Profile
topic Ciencias Exactas
Metagenomics
Metatranscriptomics
Next Generation Sequencing
Homology Search
Taxonomic Profile
Functional Profile
dc.description.none.fl_txt_mv Data generated by metagenomic and metatranscriptomic experiments is both enormous and inherently noisy. When using taxonomy-dependent alignment-based methods to classify and label reads, the first step consists in performing homology searches against sequence databases. To obtain the most information from the samples, nucleotide sequences are usually compared to various databases (nucleotide and protein) using local sequence aligners such as BLASTN and BLASTX. Nevertheless, the analysis and integration of these results can be problematic because the outputs from these searches usually show inconsistencies, which can be notorious when working with RNA-seq. Moreover, and to the best of our knowledge, existing tools do not criss-cross and integrate information from the different homology searches, but provide the results of each analysis separately. We developed the HoSeIn workflow to intersect the information from these homology searches, and then determine the taxonomic and functional profile of the sample using this integrated information. The workflow is based on the assumption that the sequences that correspond to a certain taxon are composed of: sequences that were assigned to the same taxon by both homology searches; sequences that were assigned to that taxon by one of the homology searches but returned no hits in the other one.
Centro Regional de Estudios Genómicos
description Data generated by metagenomic and metatranscriptomic experiments is both enormous and inherently noisy. When using taxonomy-dependent alignment-based methods to classify and label reads, the first step consists in performing homology searches against sequence databases. To obtain the most information from the samples, nucleotide sequences are usually compared to various databases (nucleotide and protein) using local sequence aligners such as BLASTN and BLASTX. Nevertheless, the analysis and integration of these results can be problematic because the outputs from these searches usually show inconsistencies, which can be notorious when working with RNA-seq. Moreover, and to the best of our knowledge, existing tools do not criss-cross and integrate information from the different homology searches, but provide the results of each analysis separately. We developed the HoSeIn workflow to intersect the information from these homology searches, and then determine the taxonomic and functional profile of the sample using this integrated information. The workflow is based on the assumption that the sequences that correspond to a certain taxon are composed of: sequences that were assigned to the same taxon by both homology searches; sequences that were assigned to that taxon by one of the homology searches but returned no hits in the other one.
publishDate 2020
dc.date.none.fl_str_mv 2020-07-20
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
Articulo
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/132750
url http://sedici.unlp.edu.ar/handle/10915/132750
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/issn/2331-8325
info:eu-repo/semantics/altIdentifier/doi/10.21769/bioprotoc.3679
info:eu-repo/semantics/altIdentifier/pmid/33659350
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by/4.0/
Creative Commons Attribution 4.0 International (CC BY 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by/4.0/
Creative Commons Attribution 4.0 International (CC BY 4.0)
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1861918857808052224
score 13.018236