Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development

Autores
Zhou, Jian*; Schor, Ignacio Esteban; Yao, Victoria; Theesfeld, Chandra L.; Marco-Ferreres, Raquel; Tadych, Alicja; Furlong, Eileen E. M.; Troyanskaya, Olga G.
Año de publicación
2019
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Comprehensive information on the timing and location of gene expression is fundamental to our understanding of embryonic development and tissue formation. While high-throughput in situ hybridization projects provide invaluable information about developmental gene expression patterns for model organisms like Drosophila, the output of these experiments is primarily qualitative, and a high proportion of protein coding genes and most non-coding genes lack any annotation. Accurate data-centric predictions of spatio-temporal gene expression will therefore complement current in situ hybridization efforts. Here, we applied a machine learning approach by training models on all public gene expression and chromatin data, even from whole-organism experiments, to provide genome-wide, quantitative spatiotemporal predictions for all genes. We developed structured in silico nano-dissection, a computational approach that predicts gene expression in >200 tissue-developmental stages. The algorithm integrates expression signals from a compendium of 6,378 genome-wide expression and chromatin profiling experiments in a cell lineage-aware fashion. We systematically evaluated our performance via cross-validation and experimentally confirmed 22 new predictions for four different embryonic tissues. The model also predicts complex, multi-tissue expression and developmental regulation with high accuracy. We further show the potential of applying these genome-wide predictions to extract tissue specificity signals from non-tissue-dissected experiments, and to prioritize tissues and stages for disease modeling. This resource, together with the exploratory tools are freely available at our webserver http://find.princeton.edu, which provides a valuable tool for a range of applications, from predicting spatio-temporal expression patterns to recognizing tissue signatures from differential gene expression profiles.
Fil: Zhou, Jian*. University of Princeton; Estados Unidos
Fil: Schor, Ignacio Esteban. European Molecular Biology Laboratory; Alemania. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Fisiología, Biología Molecular y Neurociencias. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Fisiología, Biología Molecular y Neurociencias; Argentina
Fil: Yao, Victoria. University of Princeton; Estados Unidos
Fil: Theesfeld, Chandra L.. University of Princeton; Estados Unidos
Fil: Marco-Ferreres, Raquel. European Molecular Biology Laboratory; Alemania
Fil: Tadych, Alicja. University of Princeton; Estados Unidos
Fil: Furlong, Eileen E. M.. European Molecular Biology Laboratory; Alemania
Fil: Troyanskaya, Olga G.. University of Princeton; Estados Unidos
Materia
Gene expression
Gene prediction
Drosophila melanogaster
Embryo
Muscle tissue
Transcriptome analysis
Machine learning algorithms
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/121206

id CONICETDig_a432702ed507c90df9d2b285bde0f598
oai_identifier_str oai:ri.conicet.gov.ar:11336/121206
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Accurate genome-wide predictions of spatio-temporal gene expression during embryonic developmentZhou, Jian*Schor, Ignacio EstebanYao, VictoriaTheesfeld, Chandra L.Marco-Ferreres, RaquelTadych, AlicjaFurlong, Eileen E. M.Troyanskaya, Olga G.Gene expressionGene predictionDrosophila melanogasterEmbryoMuscle tissueTranscriptome analysisMachine learning algorithmshttps://purl.org/becyt/ford/1.6https://purl.org/becyt/ford/1Comprehensive information on the timing and location of gene expression is fundamental to our understanding of embryonic development and tissue formation. While high-throughput in situ hybridization projects provide invaluable information about developmental gene expression patterns for model organisms like Drosophila, the output of these experiments is primarily qualitative, and a high proportion of protein coding genes and most non-coding genes lack any annotation. Accurate data-centric predictions of spatio-temporal gene expression will therefore complement current in situ hybridization efforts. Here, we applied a machine learning approach by training models on all public gene expression and chromatin data, even from whole-organism experiments, to provide genome-wide, quantitative spatiotemporal predictions for all genes. We developed structured in silico nano-dissection, a computational approach that predicts gene expression in >200 tissue-developmental stages. The algorithm integrates expression signals from a compendium of 6,378 genome-wide expression and chromatin profiling experiments in a cell lineage-aware fashion. We systematically evaluated our performance via cross-validation and experimentally confirmed 22 new predictions for four different embryonic tissues. The model also predicts complex, multi-tissue expression and developmental regulation with high accuracy. We further show the potential of applying these genome-wide predictions to extract tissue specificity signals from non-tissue-dissected experiments, and to prioritize tissues and stages for disease modeling. This resource, together with the exploratory tools are freely available at our webserver http://find.princeton.edu, which provides a valuable tool for a range of applications, from predicting spatio-temporal expression patterns to recognizing tissue signatures from differential gene expression profiles.Fil: Zhou, Jian*. University of Princeton; Estados UnidosFil: Schor, Ignacio Esteban. European Molecular Biology Laboratory; Alemania. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Fisiología, Biología Molecular y Neurociencias. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Fisiología, Biología Molecular y Neurociencias; ArgentinaFil: Yao, Victoria. University of Princeton; Estados UnidosFil: Theesfeld, Chandra L.. University of Princeton; Estados UnidosFil: Marco-Ferreres, Raquel. European Molecular Biology Laboratory; AlemaniaFil: Tadych, Alicja. University of Princeton; Estados UnidosFil: Furlong, Eileen E. M.. European Molecular Biology Laboratory; AlemaniaFil: Troyanskaya, Olga G.. University of Princeton; Estados UnidosPublic Library of Science2019-09info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/121206Zhou, Jian*; Schor, Ignacio Esteban; Yao, Victoria; Theesfeld, Chandra L.; Marco-Ferreres, Raquel; et al.; Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development; Public Library of Science; Plos Genetics; 15; 9; 9-2019; 1-201553-7390CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://dx.plos.org/10.1371/journal.pgen.1008382info:eu-repo/semantics/altIdentifier/doi/10.1371/journal.pgen.1008382info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:41:39Zoai:ri.conicet.gov.ar:11336/121206instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:41:39.749CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development
title Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development
spellingShingle Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development
Zhou, Jian*
Gene expression
Gene prediction
Drosophila melanogaster
Embryo
Muscle tissue
Transcriptome analysis
Machine learning algorithms
title_short Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development
title_full Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development
title_fullStr Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development
title_full_unstemmed Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development
title_sort Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development
dc.creator.none.fl_str_mv Zhou, Jian*
Schor, Ignacio Esteban
Yao, Victoria
Theesfeld, Chandra L.
Marco-Ferreres, Raquel
Tadych, Alicja
Furlong, Eileen E. M.
Troyanskaya, Olga G.
author Zhou, Jian*
author_facet Zhou, Jian*
Schor, Ignacio Esteban
Yao, Victoria
Theesfeld, Chandra L.
Marco-Ferreres, Raquel
Tadych, Alicja
Furlong, Eileen E. M.
Troyanskaya, Olga G.
author_role author
author2 Schor, Ignacio Esteban
Yao, Victoria
Theesfeld, Chandra L.
Marco-Ferreres, Raquel
Tadych, Alicja
Furlong, Eileen E. M.
Troyanskaya, Olga G.
author2_role author
author
author
author
author
author
author
dc.subject.none.fl_str_mv Gene expression
Gene prediction
Drosophila melanogaster
Embryo
Muscle tissue
Transcriptome analysis
Machine learning algorithms
topic Gene expression
Gene prediction
Drosophila melanogaster
Embryo
Muscle tissue
Transcriptome analysis
Machine learning algorithms
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.6
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Comprehensive information on the timing and location of gene expression is fundamental to our understanding of embryonic development and tissue formation. While high-throughput in situ hybridization projects provide invaluable information about developmental gene expression patterns for model organisms like Drosophila, the output of these experiments is primarily qualitative, and a high proportion of protein coding genes and most non-coding genes lack any annotation. Accurate data-centric predictions of spatio-temporal gene expression will therefore complement current in situ hybridization efforts. Here, we applied a machine learning approach by training models on all public gene expression and chromatin data, even from whole-organism experiments, to provide genome-wide, quantitative spatiotemporal predictions for all genes. We developed structured in silico nano-dissection, a computational approach that predicts gene expression in >200 tissue-developmental stages. The algorithm integrates expression signals from a compendium of 6,378 genome-wide expression and chromatin profiling experiments in a cell lineage-aware fashion. We systematically evaluated our performance via cross-validation and experimentally confirmed 22 new predictions for four different embryonic tissues. The model also predicts complex, multi-tissue expression and developmental regulation with high accuracy. We further show the potential of applying these genome-wide predictions to extract tissue specificity signals from non-tissue-dissected experiments, and to prioritize tissues and stages for disease modeling. This resource, together with the exploratory tools are freely available at our webserver http://find.princeton.edu, which provides a valuable tool for a range of applications, from predicting spatio-temporal expression patterns to recognizing tissue signatures from differential gene expression profiles.
Fil: Zhou, Jian*. University of Princeton; Estados Unidos
Fil: Schor, Ignacio Esteban. European Molecular Biology Laboratory; Alemania. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Fisiología, Biología Molecular y Neurociencias. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Fisiología, Biología Molecular y Neurociencias; Argentina
Fil: Yao, Victoria. University of Princeton; Estados Unidos
Fil: Theesfeld, Chandra L.. University of Princeton; Estados Unidos
Fil: Marco-Ferreres, Raquel. European Molecular Biology Laboratory; Alemania
Fil: Tadych, Alicja. University of Princeton; Estados Unidos
Fil: Furlong, Eileen E. M.. European Molecular Biology Laboratory; Alemania
Fil: Troyanskaya, Olga G.. University of Princeton; Estados Unidos
description Comprehensive information on the timing and location of gene expression is fundamental to our understanding of embryonic development and tissue formation. While high-throughput in situ hybridization projects provide invaluable information about developmental gene expression patterns for model organisms like Drosophila, the output of these experiments is primarily qualitative, and a high proportion of protein coding genes and most non-coding genes lack any annotation. Accurate data-centric predictions of spatio-temporal gene expression will therefore complement current in situ hybridization efforts. Here, we applied a machine learning approach by training models on all public gene expression and chromatin data, even from whole-organism experiments, to provide genome-wide, quantitative spatiotemporal predictions for all genes. We developed structured in silico nano-dissection, a computational approach that predicts gene expression in >200 tissue-developmental stages. The algorithm integrates expression signals from a compendium of 6,378 genome-wide expression and chromatin profiling experiments in a cell lineage-aware fashion. We systematically evaluated our performance via cross-validation and experimentally confirmed 22 new predictions for four different embryonic tissues. The model also predicts complex, multi-tissue expression and developmental regulation with high accuracy. We further show the potential of applying these genome-wide predictions to extract tissue specificity signals from non-tissue-dissected experiments, and to prioritize tissues and stages for disease modeling. This resource, together with the exploratory tools are freely available at our webserver http://find.princeton.edu, which provides a valuable tool for a range of applications, from predicting spatio-temporal expression patterns to recognizing tissue signatures from differential gene expression profiles.
publishDate 2019
dc.date.none.fl_str_mv 2019-09
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/121206
Zhou, Jian*; Schor, Ignacio Esteban; Yao, Victoria; Theesfeld, Chandra L.; Marco-Ferreres, Raquel; et al.; Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development; Public Library of Science; Plos Genetics; 15; 9; 9-2019; 1-20
1553-7390
CONICET Digital
CONICET
url http://hdl.handle.net/11336/121206
identifier_str_mv Zhou, Jian*; Schor, Ignacio Esteban; Yao, Victoria; Theesfeld, Chandra L.; Marco-Ferreres, Raquel; et al.; Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development; Public Library of Science; Plos Genetics; 15; 9; 9-2019; 1-20
1553-7390
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://dx.plos.org/10.1371/journal.pgen.1008382
info:eu-repo/semantics/altIdentifier/doi/10.1371/journal.pgen.1008382
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Public Library of Science
publisher.none.fl_str_mv Public Library of Science
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844614447800254464
score 13.070432