Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development
- Autores
- Zhou, Jian*; Schor, Ignacio Esteban; Yao, Victoria; Theesfeld, Chandra L.; Marco-Ferreres, Raquel; Tadych, Alicja; Furlong, Eileen E. M.; Troyanskaya, Olga G.
- Año de publicación
- 2019
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Comprehensive information on the timing and location of gene expression is fundamental to our understanding of embryonic development and tissue formation. While high-throughput in situ hybridization projects provide invaluable information about developmental gene expression patterns for model organisms like Drosophila, the output of these experiments is primarily qualitative, and a high proportion of protein coding genes and most non-coding genes lack any annotation. Accurate data-centric predictions of spatio-temporal gene expression will therefore complement current in situ hybridization efforts. Here, we applied a machine learning approach by training models on all public gene expression and chromatin data, even from whole-organism experiments, to provide genome-wide, quantitative spatiotemporal predictions for all genes. We developed structured in silico nano-dissection, a computational approach that predicts gene expression in >200 tissue-developmental stages. The algorithm integrates expression signals from a compendium of 6,378 genome-wide expression and chromatin profiling experiments in a cell lineage-aware fashion. We systematically evaluated our performance via cross-validation and experimentally confirmed 22 new predictions for four different embryonic tissues. The model also predicts complex, multi-tissue expression and developmental regulation with high accuracy. We further show the potential of applying these genome-wide predictions to extract tissue specificity signals from non-tissue-dissected experiments, and to prioritize tissues and stages for disease modeling. This resource, together with the exploratory tools are freely available at our webserver http://find.princeton.edu, which provides a valuable tool for a range of applications, from predicting spatio-temporal expression patterns to recognizing tissue signatures from differential gene expression profiles.
Fil: Zhou, Jian*. University of Princeton; Estados Unidos
Fil: Schor, Ignacio Esteban. European Molecular Biology Laboratory; Alemania. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Fisiología, Biología Molecular y Neurociencias. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Fisiología, Biología Molecular y Neurociencias; Argentina
Fil: Yao, Victoria. University of Princeton; Estados Unidos
Fil: Theesfeld, Chandra L.. University of Princeton; Estados Unidos
Fil: Marco-Ferreres, Raquel. European Molecular Biology Laboratory; Alemania
Fil: Tadych, Alicja. University of Princeton; Estados Unidos
Fil: Furlong, Eileen E. M.. European Molecular Biology Laboratory; Alemania
Fil: Troyanskaya, Olga G.. University of Princeton; Estados Unidos - Materia
-
Gene expression
Gene prediction
Drosophila melanogaster
Embryo
Muscle tissue
Transcriptome analysis
Machine learning algorithms - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/121206
Ver los metadatos del registro completo
id |
CONICETDig_a432702ed507c90df9d2b285bde0f598 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/121206 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Accurate genome-wide predictions of spatio-temporal gene expression during embryonic developmentZhou, Jian*Schor, Ignacio EstebanYao, VictoriaTheesfeld, Chandra L.Marco-Ferreres, RaquelTadych, AlicjaFurlong, Eileen E. M.Troyanskaya, Olga G.Gene expressionGene predictionDrosophila melanogasterEmbryoMuscle tissueTranscriptome analysisMachine learning algorithmshttps://purl.org/becyt/ford/1.6https://purl.org/becyt/ford/1Comprehensive information on the timing and location of gene expression is fundamental to our understanding of embryonic development and tissue formation. While high-throughput in situ hybridization projects provide invaluable information about developmental gene expression patterns for model organisms like Drosophila, the output of these experiments is primarily qualitative, and a high proportion of protein coding genes and most non-coding genes lack any annotation. Accurate data-centric predictions of spatio-temporal gene expression will therefore complement current in situ hybridization efforts. Here, we applied a machine learning approach by training models on all public gene expression and chromatin data, even from whole-organism experiments, to provide genome-wide, quantitative spatiotemporal predictions for all genes. We developed structured in silico nano-dissection, a computational approach that predicts gene expression in >200 tissue-developmental stages. The algorithm integrates expression signals from a compendium of 6,378 genome-wide expression and chromatin profiling experiments in a cell lineage-aware fashion. We systematically evaluated our performance via cross-validation and experimentally confirmed 22 new predictions for four different embryonic tissues. The model also predicts complex, multi-tissue expression and developmental regulation with high accuracy. We further show the potential of applying these genome-wide predictions to extract tissue specificity signals from non-tissue-dissected experiments, and to prioritize tissues and stages for disease modeling. This resource, together with the exploratory tools are freely available at our webserver http://find.princeton.edu, which provides a valuable tool for a range of applications, from predicting spatio-temporal expression patterns to recognizing tissue signatures from differential gene expression profiles.Fil: Zhou, Jian*. University of Princeton; Estados UnidosFil: Schor, Ignacio Esteban. European Molecular Biology Laboratory; Alemania. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Fisiología, Biología Molecular y Neurociencias. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Fisiología, Biología Molecular y Neurociencias; ArgentinaFil: Yao, Victoria. University of Princeton; Estados UnidosFil: Theesfeld, Chandra L.. University of Princeton; Estados UnidosFil: Marco-Ferreres, Raquel. European Molecular Biology Laboratory; AlemaniaFil: Tadych, Alicja. University of Princeton; Estados UnidosFil: Furlong, Eileen E. M.. European Molecular Biology Laboratory; AlemaniaFil: Troyanskaya, Olga G.. University of Princeton; Estados UnidosPublic Library of Science2019-09info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/121206Zhou, Jian*; Schor, Ignacio Esteban; Yao, Victoria; Theesfeld, Chandra L.; Marco-Ferreres, Raquel; et al.; Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development; Public Library of Science; Plos Genetics; 15; 9; 9-2019; 1-201553-7390CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://dx.plos.org/10.1371/journal.pgen.1008382info:eu-repo/semantics/altIdentifier/doi/10.1371/journal.pgen.1008382info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:41:39Zoai:ri.conicet.gov.ar:11336/121206instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:41:39.749CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development |
title |
Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development |
spellingShingle |
Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development Zhou, Jian* Gene expression Gene prediction Drosophila melanogaster Embryo Muscle tissue Transcriptome analysis Machine learning algorithms |
title_short |
Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development |
title_full |
Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development |
title_fullStr |
Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development |
title_full_unstemmed |
Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development |
title_sort |
Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development |
dc.creator.none.fl_str_mv |
Zhou, Jian* Schor, Ignacio Esteban Yao, Victoria Theesfeld, Chandra L. Marco-Ferreres, Raquel Tadych, Alicja Furlong, Eileen E. M. Troyanskaya, Olga G. |
author |
Zhou, Jian* |
author_facet |
Zhou, Jian* Schor, Ignacio Esteban Yao, Victoria Theesfeld, Chandra L. Marco-Ferreres, Raquel Tadych, Alicja Furlong, Eileen E. M. Troyanskaya, Olga G. |
author_role |
author |
author2 |
Schor, Ignacio Esteban Yao, Victoria Theesfeld, Chandra L. Marco-Ferreres, Raquel Tadych, Alicja Furlong, Eileen E. M. Troyanskaya, Olga G. |
author2_role |
author author author author author author author |
dc.subject.none.fl_str_mv |
Gene expression Gene prediction Drosophila melanogaster Embryo Muscle tissue Transcriptome analysis Machine learning algorithms |
topic |
Gene expression Gene prediction Drosophila melanogaster Embryo Muscle tissue Transcriptome analysis Machine learning algorithms |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.6 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
Comprehensive information on the timing and location of gene expression is fundamental to our understanding of embryonic development and tissue formation. While high-throughput in situ hybridization projects provide invaluable information about developmental gene expression patterns for model organisms like Drosophila, the output of these experiments is primarily qualitative, and a high proportion of protein coding genes and most non-coding genes lack any annotation. Accurate data-centric predictions of spatio-temporal gene expression will therefore complement current in situ hybridization efforts. Here, we applied a machine learning approach by training models on all public gene expression and chromatin data, even from whole-organism experiments, to provide genome-wide, quantitative spatiotemporal predictions for all genes. We developed structured in silico nano-dissection, a computational approach that predicts gene expression in >200 tissue-developmental stages. The algorithm integrates expression signals from a compendium of 6,378 genome-wide expression and chromatin profiling experiments in a cell lineage-aware fashion. We systematically evaluated our performance via cross-validation and experimentally confirmed 22 new predictions for four different embryonic tissues. The model also predicts complex, multi-tissue expression and developmental regulation with high accuracy. We further show the potential of applying these genome-wide predictions to extract tissue specificity signals from non-tissue-dissected experiments, and to prioritize tissues and stages for disease modeling. This resource, together with the exploratory tools are freely available at our webserver http://find.princeton.edu, which provides a valuable tool for a range of applications, from predicting spatio-temporal expression patterns to recognizing tissue signatures from differential gene expression profiles. Fil: Zhou, Jian*. University of Princeton; Estados Unidos Fil: Schor, Ignacio Esteban. European Molecular Biology Laboratory; Alemania. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Fisiología, Biología Molecular y Neurociencias. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Fisiología, Biología Molecular y Neurociencias; Argentina Fil: Yao, Victoria. University of Princeton; Estados Unidos Fil: Theesfeld, Chandra L.. University of Princeton; Estados Unidos Fil: Marco-Ferreres, Raquel. European Molecular Biology Laboratory; Alemania Fil: Tadych, Alicja. University of Princeton; Estados Unidos Fil: Furlong, Eileen E. M.. European Molecular Biology Laboratory; Alemania Fil: Troyanskaya, Olga G.. University of Princeton; Estados Unidos |
description |
Comprehensive information on the timing and location of gene expression is fundamental to our understanding of embryonic development and tissue formation. While high-throughput in situ hybridization projects provide invaluable information about developmental gene expression patterns for model organisms like Drosophila, the output of these experiments is primarily qualitative, and a high proportion of protein coding genes and most non-coding genes lack any annotation. Accurate data-centric predictions of spatio-temporal gene expression will therefore complement current in situ hybridization efforts. Here, we applied a machine learning approach by training models on all public gene expression and chromatin data, even from whole-organism experiments, to provide genome-wide, quantitative spatiotemporal predictions for all genes. We developed structured in silico nano-dissection, a computational approach that predicts gene expression in >200 tissue-developmental stages. The algorithm integrates expression signals from a compendium of 6,378 genome-wide expression and chromatin profiling experiments in a cell lineage-aware fashion. We systematically evaluated our performance via cross-validation and experimentally confirmed 22 new predictions for four different embryonic tissues. The model also predicts complex, multi-tissue expression and developmental regulation with high accuracy. We further show the potential of applying these genome-wide predictions to extract tissue specificity signals from non-tissue-dissected experiments, and to prioritize tissues and stages for disease modeling. This resource, together with the exploratory tools are freely available at our webserver http://find.princeton.edu, which provides a valuable tool for a range of applications, from predicting spatio-temporal expression patterns to recognizing tissue signatures from differential gene expression profiles. |
publishDate |
2019 |
dc.date.none.fl_str_mv |
2019-09 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/121206 Zhou, Jian*; Schor, Ignacio Esteban; Yao, Victoria; Theesfeld, Chandra L.; Marco-Ferreres, Raquel; et al.; Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development; Public Library of Science; Plos Genetics; 15; 9; 9-2019; 1-20 1553-7390 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/121206 |
identifier_str_mv |
Zhou, Jian*; Schor, Ignacio Esteban; Yao, Victoria; Theesfeld, Chandra L.; Marco-Ferreres, Raquel; et al.; Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development; Public Library of Science; Plos Genetics; 15; 9; 9-2019; 1-20 1553-7390 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://dx.plos.org/10.1371/journal.pgen.1008382 info:eu-repo/semantics/altIdentifier/doi/10.1371/journal.pgen.1008382 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Public Library of Science |
publisher.none.fl_str_mv |
Public Library of Science |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844614447800254464 |
score |
13.070432 |