HoSeIn: A Workflow for Integrating Various Homology Search Results from Metagenomic and Metatranscriptomic Sequence Datasets
- Autores
- Rozadilla, Gastón; Moreiras Clemente, Jorgelina; McCarthy, Christina Beryl
- Año de publicación
- 2020
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Data generated by metagenomic and metatranscriptomic experiments is both enormous and inherently noisy. When using taxonomy-dependent alignment-based methods to classify and label reads, the first step consists in performing homology searches against sequence databases. To obtain the most information from the samples, nucleotide sequences are usually compared to various databases (nucleotide and protein) using local sequence aligners such as BLASTN and BLASTX. Nevertheless, the analysis and integration of these results can be problematic because the outputs from these searches usually show inconsistencies, which can be notorious when working with RNA-seq. Moreover, and to the best of our knowledge, existing tools do not criss-cross and integrate information from the different homology searches, but provide the results of each analysis separately. We developed the HoSeIn workflow to intersect the information from these homology searches, and then determine the taxonomic and functional profile of the sample using this integrated information. The workflow is based on the assumption that the sequences that correspond to a certain taxon are composed of: sequences that were assigned to the same taxon by both homology searches; sequences that were assigned to that taxon by one of the homology searches but returned no hits in the other one.
Centro Regional de Estudios Genómicos - Materia
-
Ciencias Exactas
Metagenomics
Metatranscriptomics
Next Generation Sequencing
Homology Search
Taxonomic Profile
Functional Profile - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by/4.0/
- Repositorio
.jpg)
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/132750
Ver los metadatos del registro completo
| id |
SEDICI_65b85254c719be78d4a3b380b0ae5aee |
|---|---|
| oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/132750 |
| network_acronym_str |
SEDICI |
| repository_id_str |
1329 |
| network_name_str |
SEDICI (UNLP) |
| spelling |
HoSeIn: A Workflow for Integrating Various Homology Search Results from Metagenomic and Metatranscriptomic Sequence DatasetsRozadilla, GastónMoreiras Clemente, JorgelinaMcCarthy, Christina BerylCiencias ExactasMetagenomicsMetatranscriptomicsNext Generation SequencingHomology SearchTaxonomic ProfileFunctional ProfileData generated by metagenomic and metatranscriptomic experiments is both enormous and inherently noisy. When using taxonomy-dependent alignment-based methods to classify and label reads, the first step consists in performing homology searches against sequence databases. To obtain the most information from the samples, nucleotide sequences are usually compared to various databases (nucleotide and protein) using local sequence aligners such as BLASTN and BLASTX. Nevertheless, the analysis and integration of these results can be problematic because the outputs from these searches usually show inconsistencies, which can be notorious when working with RNA-seq. Moreover, and to the best of our knowledge, existing tools do not criss-cross and integrate information from the different homology searches, but provide the results of each analysis separately. We developed the HoSeIn workflow to intersect the information from these homology searches, and then determine the taxonomic and functional profile of the sample using this integrated information. The workflow is based on the assumption that the sequences that correspond to a certain taxon are composed of: sequences that were assigned to the same taxon by both homology searches; sequences that were assigned to that taxon by one of the homology searches but returned no hits in the other one.Centro Regional de Estudios Genómicos2020-07-20info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionArticulohttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/132750enginfo:eu-repo/semantics/altIdentifier/issn/2331-8325info:eu-repo/semantics/altIdentifier/doi/10.21769/bioprotoc.3679info:eu-repo/semantics/altIdentifier/pmid/33659350info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by/4.0/Creative Commons Attribution 4.0 International (CC BY 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2026-04-08T10:21:25Zoai:sedici.unlp.edu.ar:10915/132750Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292026-04-08 10:21:26.129SEDICI (UNLP) - Universidad Nacional de La Platafalse |
| dc.title.none.fl_str_mv |
HoSeIn: A Workflow for Integrating Various Homology Search Results from Metagenomic and Metatranscriptomic Sequence Datasets |
| title |
HoSeIn: A Workflow for Integrating Various Homology Search Results from Metagenomic and Metatranscriptomic Sequence Datasets |
| spellingShingle |
HoSeIn: A Workflow for Integrating Various Homology Search Results from Metagenomic and Metatranscriptomic Sequence Datasets Rozadilla, Gastón Ciencias Exactas Metagenomics Metatranscriptomics Next Generation Sequencing Homology Search Taxonomic Profile Functional Profile |
| title_short |
HoSeIn: A Workflow for Integrating Various Homology Search Results from Metagenomic and Metatranscriptomic Sequence Datasets |
| title_full |
HoSeIn: A Workflow for Integrating Various Homology Search Results from Metagenomic and Metatranscriptomic Sequence Datasets |
| title_fullStr |
HoSeIn: A Workflow for Integrating Various Homology Search Results from Metagenomic and Metatranscriptomic Sequence Datasets |
| title_full_unstemmed |
HoSeIn: A Workflow for Integrating Various Homology Search Results from Metagenomic and Metatranscriptomic Sequence Datasets |
| title_sort |
HoSeIn: A Workflow for Integrating Various Homology Search Results from Metagenomic and Metatranscriptomic Sequence Datasets |
| dc.creator.none.fl_str_mv |
Rozadilla, Gastón Moreiras Clemente, Jorgelina McCarthy, Christina Beryl |
| author |
Rozadilla, Gastón |
| author_facet |
Rozadilla, Gastón Moreiras Clemente, Jorgelina McCarthy, Christina Beryl |
| author_role |
author |
| author2 |
Moreiras Clemente, Jorgelina McCarthy, Christina Beryl |
| author2_role |
author author |
| dc.subject.none.fl_str_mv |
Ciencias Exactas Metagenomics Metatranscriptomics Next Generation Sequencing Homology Search Taxonomic Profile Functional Profile |
| topic |
Ciencias Exactas Metagenomics Metatranscriptomics Next Generation Sequencing Homology Search Taxonomic Profile Functional Profile |
| dc.description.none.fl_txt_mv |
Data generated by metagenomic and metatranscriptomic experiments is both enormous and inherently noisy. When using taxonomy-dependent alignment-based methods to classify and label reads, the first step consists in performing homology searches against sequence databases. To obtain the most information from the samples, nucleotide sequences are usually compared to various databases (nucleotide and protein) using local sequence aligners such as BLASTN and BLASTX. Nevertheless, the analysis and integration of these results can be problematic because the outputs from these searches usually show inconsistencies, which can be notorious when working with RNA-seq. Moreover, and to the best of our knowledge, existing tools do not criss-cross and integrate information from the different homology searches, but provide the results of each analysis separately. We developed the HoSeIn workflow to intersect the information from these homology searches, and then determine the taxonomic and functional profile of the sample using this integrated information. The workflow is based on the assumption that the sequences that correspond to a certain taxon are composed of: sequences that were assigned to the same taxon by both homology searches; sequences that were assigned to that taxon by one of the homology searches but returned no hits in the other one. Centro Regional de Estudios Genómicos |
| description |
Data generated by metagenomic and metatranscriptomic experiments is both enormous and inherently noisy. When using taxonomy-dependent alignment-based methods to classify and label reads, the first step consists in performing homology searches against sequence databases. To obtain the most information from the samples, nucleotide sequences are usually compared to various databases (nucleotide and protein) using local sequence aligners such as BLASTN and BLASTX. Nevertheless, the analysis and integration of these results can be problematic because the outputs from these searches usually show inconsistencies, which can be notorious when working with RNA-seq. Moreover, and to the best of our knowledge, existing tools do not criss-cross and integrate information from the different homology searches, but provide the results of each analysis separately. We developed the HoSeIn workflow to intersect the information from these homology searches, and then determine the taxonomic and functional profile of the sample using this integrated information. The workflow is based on the assumption that the sequences that correspond to a certain taxon are composed of: sequences that were assigned to the same taxon by both homology searches; sequences that were assigned to that taxon by one of the homology searches but returned no hits in the other one. |
| publishDate |
2020 |
| dc.date.none.fl_str_mv |
2020-07-20 |
| dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion Articulo http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
| format |
article |
| status_str |
publishedVersion |
| dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/132750 |
| url |
http://sedici.unlp.edu.ar/handle/10915/132750 |
| dc.language.none.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/issn/2331-8325 info:eu-repo/semantics/altIdentifier/doi/10.21769/bioprotoc.3679 info:eu-repo/semantics/altIdentifier/pmid/33659350 |
| dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/4.0/ Creative Commons Attribution 4.0 International (CC BY 4.0) |
| eu_rights_str_mv |
openAccess |
| rights_invalid_str_mv |
http://creativecommons.org/licenses/by/4.0/ Creative Commons Attribution 4.0 International (CC BY 4.0) |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
| reponame_str |
SEDICI (UNLP) |
| collection |
SEDICI (UNLP) |
| instname_str |
Universidad Nacional de La Plata |
| instacron_str |
UNLP |
| institution |
UNLP |
| repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
| repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
| _version_ |
1861918857808052224 |
| score |
13.018236 |