A Quantitative Profiling Tool for Diverse Genomic Data Types Reveals Potential Associations between Chromatin and PremRNA Processing
- Autores
- Kremsky, Isaac; Bellora, Nicolás; Eyras, Eduardo
- Año de publicación
- 2015
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- High-throughput sequencing, and genome-based datasets in general, are often represented as profiles centered at reference points to study the association of protein binding and other signals to particular regulatory mechanisms. Although these profiles often provide compelling evidence of these associations, they do not provide a quantitative assessment of the enrichment, which makes the comparison between signals and conditions difficult. In addition, a number of biases can confound profiles, but are rarely accounted for in the tools currently available. We present a novel computational method, ProfileSeq, for the quantitative assessment of biological profiles to provide an exact, nonparametric test that specific regions of the test profile have higher or lower signal densities than a control set. The method is applicable to high-throughput sequencing data (ChIP-Seq, GRO-Seq, CLIP-Seq, etc.) and to genome-based datasets (motifs, etc.). We validate ProfileSeq by recovering and providing a quantitative assessment of several results reported before in the literature using independent datasets. We show that input signal and mappability have confounding effects on the profile results, but that normalizing the signal by input reads can eliminate these biases while preserving the biological signal. Moreover, we apply ProfileSeq to ChIP-Seq data for transcription factors, as well as for motif and CLIP-Seq data for splicing factors. In all examples considered, the profiles were robust to biases in mappability of sequencing reads. Furthermore, analyses performed with ProfileSeq reveal a number of putative relationships between transcription factor binding to DNA and splicing factor binding to pre-mRNA, adding to the growing body of evidence relating chromatin and pre-mRNA processing. ProfileSeq provides a robust way to quantify genome-wide coordinate-based signal. Software and documentation are freely available for academic use at https://bitbucket.org/regulatorygenomicsupf/profileseq/.
Fil: Kremsky, Isaac . Universitat Pompeu Fabra; España
Fil: Bellora, Nicolás. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Patagonia Norte. Instituto de Investigación En Biodiversidad y Medioambiente; Argentina. Universidad Nacional del Comahue. Centro Regional Universidad de Bariloche. Departamento de Biologia. Laboratorio de Microbiologia Aplicada y Biotecnologia; Argentina
Fil: Eyras, Eduardo . Institució Catalana de Recerca I Estudis Avancats; España - Materia
-
High-throughput sequencing
genomics
profiling
bioinformatics - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/12050
Ver los metadatos del registro completo
id |
CONICETDig_44ef56b7db344e5f10625c7cccc0b724 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/12050 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
A Quantitative Profiling Tool for Diverse Genomic Data Types Reveals Potential Associations between Chromatin and PremRNA ProcessingKremsky, Isaac Bellora, NicolásEyras, Eduardo High-throughput sequencinggenomicsprofilingbioinformaticshttps://purl.org/becyt/ford/1.6https://purl.org/becyt/ford/1High-throughput sequencing, and genome-based datasets in general, are often represented as profiles centered at reference points to study the association of protein binding and other signals to particular regulatory mechanisms. Although these profiles often provide compelling evidence of these associations, they do not provide a quantitative assessment of the enrichment, which makes the comparison between signals and conditions difficult. In addition, a number of biases can confound profiles, but are rarely accounted for in the tools currently available. We present a novel computational method, ProfileSeq, for the quantitative assessment of biological profiles to provide an exact, nonparametric test that specific regions of the test profile have higher or lower signal densities than a control set. The method is applicable to high-throughput sequencing data (ChIP-Seq, GRO-Seq, CLIP-Seq, etc.) and to genome-based datasets (motifs, etc.). We validate ProfileSeq by recovering and providing a quantitative assessment of several results reported before in the literature using independent datasets. We show that input signal and mappability have confounding effects on the profile results, but that normalizing the signal by input reads can eliminate these biases while preserving the biological signal. Moreover, we apply ProfileSeq to ChIP-Seq data for transcription factors, as well as for motif and CLIP-Seq data for splicing factors. In all examples considered, the profiles were robust to biases in mappability of sequencing reads. Furthermore, analyses performed with ProfileSeq reveal a number of putative relationships between transcription factor binding to DNA and splicing factor binding to pre-mRNA, adding to the growing body of evidence relating chromatin and pre-mRNA processing. ProfileSeq provides a robust way to quantify genome-wide coordinate-based signal. Software and documentation are freely available for academic use at https://bitbucket.org/regulatorygenomicsupf/profileseq/.Fil: Kremsky, Isaac . Universitat Pompeu Fabra; EspañaFil: Bellora, Nicolás. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Patagonia Norte. Instituto de Investigación En Biodiversidad y Medioambiente; Argentina. Universidad Nacional del Comahue. Centro Regional Universidad de Bariloche. Departamento de Biologia. Laboratorio de Microbiologia Aplicada y Biotecnologia; ArgentinaFil: Eyras, Eduardo . Institució Catalana de Recerca I Estudis Avancats; EspañaPublic Library Of Science2015-07info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/12050Kremsky, Isaac ; Bellora, Nicolás; Eyras, Eduardo ; A Quantitative Profiling Tool for Diverse Genomic Data Types Reveals Potential Associations between Chromatin and PremRNA Processing; Public Library Of Science; Plos One; 10; 7; 7-2015; 1-291932-6203enginfo:eu-repo/semantics/altIdentifier/url/http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0132448info:eu-repo/semantics/altIdentifier/doi/10.1371/journal.pone.0132448info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-10T13:06:12Zoai:ri.conicet.gov.ar:11336/12050instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-10 13:06:12.951CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
A Quantitative Profiling Tool for Diverse Genomic Data Types Reveals Potential Associations between Chromatin and PremRNA Processing |
title |
A Quantitative Profiling Tool for Diverse Genomic Data Types Reveals Potential Associations between Chromatin and PremRNA Processing |
spellingShingle |
A Quantitative Profiling Tool for Diverse Genomic Data Types Reveals Potential Associations between Chromatin and PremRNA Processing Kremsky, Isaac High-throughput sequencing genomics profiling bioinformatics |
title_short |
A Quantitative Profiling Tool for Diverse Genomic Data Types Reveals Potential Associations between Chromatin and PremRNA Processing |
title_full |
A Quantitative Profiling Tool for Diverse Genomic Data Types Reveals Potential Associations between Chromatin and PremRNA Processing |
title_fullStr |
A Quantitative Profiling Tool for Diverse Genomic Data Types Reveals Potential Associations between Chromatin and PremRNA Processing |
title_full_unstemmed |
A Quantitative Profiling Tool for Diverse Genomic Data Types Reveals Potential Associations between Chromatin and PremRNA Processing |
title_sort |
A Quantitative Profiling Tool for Diverse Genomic Data Types Reveals Potential Associations between Chromatin and PremRNA Processing |
dc.creator.none.fl_str_mv |
Kremsky, Isaac Bellora, Nicolás Eyras, Eduardo |
author |
Kremsky, Isaac |
author_facet |
Kremsky, Isaac Bellora, Nicolás Eyras, Eduardo |
author_role |
author |
author2 |
Bellora, Nicolás Eyras, Eduardo |
author2_role |
author author |
dc.subject.none.fl_str_mv |
High-throughput sequencing genomics profiling bioinformatics |
topic |
High-throughput sequencing genomics profiling bioinformatics |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.6 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
High-throughput sequencing, and genome-based datasets in general, are often represented as profiles centered at reference points to study the association of protein binding and other signals to particular regulatory mechanisms. Although these profiles often provide compelling evidence of these associations, they do not provide a quantitative assessment of the enrichment, which makes the comparison between signals and conditions difficult. In addition, a number of biases can confound profiles, but are rarely accounted for in the tools currently available. We present a novel computational method, ProfileSeq, for the quantitative assessment of biological profiles to provide an exact, nonparametric test that specific regions of the test profile have higher or lower signal densities than a control set. The method is applicable to high-throughput sequencing data (ChIP-Seq, GRO-Seq, CLIP-Seq, etc.) and to genome-based datasets (motifs, etc.). We validate ProfileSeq by recovering and providing a quantitative assessment of several results reported before in the literature using independent datasets. We show that input signal and mappability have confounding effects on the profile results, but that normalizing the signal by input reads can eliminate these biases while preserving the biological signal. Moreover, we apply ProfileSeq to ChIP-Seq data for transcription factors, as well as for motif and CLIP-Seq data for splicing factors. In all examples considered, the profiles were robust to biases in mappability of sequencing reads. Furthermore, analyses performed with ProfileSeq reveal a number of putative relationships between transcription factor binding to DNA and splicing factor binding to pre-mRNA, adding to the growing body of evidence relating chromatin and pre-mRNA processing. ProfileSeq provides a robust way to quantify genome-wide coordinate-based signal. Software and documentation are freely available for academic use at https://bitbucket.org/regulatorygenomicsupf/profileseq/. Fil: Kremsky, Isaac . Universitat Pompeu Fabra; España Fil: Bellora, Nicolás. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Patagonia Norte. Instituto de Investigación En Biodiversidad y Medioambiente; Argentina. Universidad Nacional del Comahue. Centro Regional Universidad de Bariloche. Departamento de Biologia. Laboratorio de Microbiologia Aplicada y Biotecnologia; Argentina Fil: Eyras, Eduardo . Institució Catalana de Recerca I Estudis Avancats; España |
description |
High-throughput sequencing, and genome-based datasets in general, are often represented as profiles centered at reference points to study the association of protein binding and other signals to particular regulatory mechanisms. Although these profiles often provide compelling evidence of these associations, they do not provide a quantitative assessment of the enrichment, which makes the comparison between signals and conditions difficult. In addition, a number of biases can confound profiles, but are rarely accounted for in the tools currently available. We present a novel computational method, ProfileSeq, for the quantitative assessment of biological profiles to provide an exact, nonparametric test that specific regions of the test profile have higher or lower signal densities than a control set. The method is applicable to high-throughput sequencing data (ChIP-Seq, GRO-Seq, CLIP-Seq, etc.) and to genome-based datasets (motifs, etc.). We validate ProfileSeq by recovering and providing a quantitative assessment of several results reported before in the literature using independent datasets. We show that input signal and mappability have confounding effects on the profile results, but that normalizing the signal by input reads can eliminate these biases while preserving the biological signal. Moreover, we apply ProfileSeq to ChIP-Seq data for transcription factors, as well as for motif and CLIP-Seq data for splicing factors. In all examples considered, the profiles were robust to biases in mappability of sequencing reads. Furthermore, analyses performed with ProfileSeq reveal a number of putative relationships between transcription factor binding to DNA and splicing factor binding to pre-mRNA, adding to the growing body of evidence relating chromatin and pre-mRNA processing. ProfileSeq provides a robust way to quantify genome-wide coordinate-based signal. Software and documentation are freely available for academic use at https://bitbucket.org/regulatorygenomicsupf/profileseq/. |
publishDate |
2015 |
dc.date.none.fl_str_mv |
2015-07 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/12050 Kremsky, Isaac ; Bellora, Nicolás; Eyras, Eduardo ; A Quantitative Profiling Tool for Diverse Genomic Data Types Reveals Potential Associations between Chromatin and PremRNA Processing; Public Library Of Science; Plos One; 10; 7; 7-2015; 1-29 1932-6203 |
url |
http://hdl.handle.net/11336/12050 |
identifier_str_mv |
Kremsky, Isaac ; Bellora, Nicolás; Eyras, Eduardo ; A Quantitative Profiling Tool for Diverse Genomic Data Types Reveals Potential Associations between Chromatin and PremRNA Processing; Public Library Of Science; Plos One; 10; 7; 7-2015; 1-29 1932-6203 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0132448 info:eu-repo/semantics/altIdentifier/doi/10.1371/journal.pone.0132448 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Public Library Of Science |
publisher.none.fl_str_mv |
Public Library Of Science |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1842980251444969472 |
score |
12.993085 |