A Pan-cancer Somatic Mutation Embedding using Autoencoders

Autores
Palazzo, Martin; Beauseroy, Pierre; Yankilevich, Patricio
Año de publicación
2019
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Background: Next generation sequencing instruments are providing new opportunities for comprehensive analyses of cancer genomes. The increasing availability of tumor data allows to research the complexity of cancer disease with machine learning methods. The large available repositories of high dimensional tumor samples characterised with germline and somatic mutation data requires advance computational modelling for data interpretation. In this work, we propose to analyze this complex data with neural network learning, a methodology that made impressive advances in image and natural language processing. Results: Here we present a tumor mutation profile analysis pipeline based on an autoencoder model, which is used to discover better representations of lower dimensionality from large somatic mutation data of 40 different tumor types and subtypes. Kernel learning with hierarchical cluster analysis are used to assess the quality of the learned somatic mutation embedding, on which support vector machine models are used to accurately classify tumor subtypes. Conclusions: The learned latent space maps the original samples in a much lower dimension while keeping the biological signals from the original tumor samples. This pipeline and the resulting embedding allows an easier exploration of the heterogeneity within and across tumor types and to perform an accurate classification of tumor samples in the pan-cancer somatic mutation landscape.
Fil: Palazzo, Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentina. Universidad Tecnológica Nacional; Argentina
Fil: Beauseroy, Pierre. Université de Technologie de Troyes; Francia
Fil: Yankilevich, Patricio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentina
Materia
AUTOENCODER
CANCER GENOMICS
KERNEL LEARNING
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/124571

id CONICETDig_2bf8f75895656d319ad181f12b90720b
oai_identifier_str oai:ri.conicet.gov.ar:11336/124571
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling A Pan-cancer Somatic Mutation Embedding using AutoencodersPalazzo, MartinBeauseroy, PierreYankilevich, PatricioAUTOENCODERCANCER GENOMICSKERNEL LEARNINGhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Background: Next generation sequencing instruments are providing new opportunities for comprehensive analyses of cancer genomes. The increasing availability of tumor data allows to research the complexity of cancer disease with machine learning methods. The large available repositories of high dimensional tumor samples characterised with germline and somatic mutation data requires advance computational modelling for data interpretation. In this work, we propose to analyze this complex data with neural network learning, a methodology that made impressive advances in image and natural language processing. Results: Here we present a tumor mutation profile analysis pipeline based on an autoencoder model, which is used to discover better representations of lower dimensionality from large somatic mutation data of 40 different tumor types and subtypes. Kernel learning with hierarchical cluster analysis are used to assess the quality of the learned somatic mutation embedding, on which support vector machine models are used to accurately classify tumor subtypes. Conclusions: The learned latent space maps the original samples in a much lower dimension while keeping the biological signals from the original tumor samples. This pipeline and the resulting embedding allows an easier exploration of the heterogeneity within and across tumor types and to perform an accurate classification of tumor samples in the pan-cancer somatic mutation landscape.Fil: Palazzo, Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentina. Universidad Tecnológica Nacional; ArgentinaFil: Beauseroy, Pierre. Université de Technologie de Troyes; FranciaFil: Yankilevich, Patricio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; ArgentinaBioMed Central2019-12info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/124571Palazzo, Martin; Beauseroy, Pierre; Yankilevich, Patricio; A Pan-cancer Somatic Mutation Embedding using Autoencoders; BioMed Central; BMC Bioinformatics; 20; 1; 12-2019; 1-101471-2105CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3298-zinfo:eu-repo/semantics/altIdentifier/doi/10.1186/s12859-019-3298-zinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-10-15T15:16:33Zoai:ri.conicet.gov.ar:11336/124571instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-10-15 15:16:34.252CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv A Pan-cancer Somatic Mutation Embedding using Autoencoders
title A Pan-cancer Somatic Mutation Embedding using Autoencoders
spellingShingle A Pan-cancer Somatic Mutation Embedding using Autoencoders
Palazzo, Martin
AUTOENCODER
CANCER GENOMICS
KERNEL LEARNING
title_short A Pan-cancer Somatic Mutation Embedding using Autoencoders
title_full A Pan-cancer Somatic Mutation Embedding using Autoencoders
title_fullStr A Pan-cancer Somatic Mutation Embedding using Autoencoders
title_full_unstemmed A Pan-cancer Somatic Mutation Embedding using Autoencoders
title_sort A Pan-cancer Somatic Mutation Embedding using Autoencoders
dc.creator.none.fl_str_mv Palazzo, Martin
Beauseroy, Pierre
Yankilevich, Patricio
author Palazzo, Martin
author_facet Palazzo, Martin
Beauseroy, Pierre
Yankilevich, Patricio
author_role author
author2 Beauseroy, Pierre
Yankilevich, Patricio
author2_role author
author
dc.subject.none.fl_str_mv AUTOENCODER
CANCER GENOMICS
KERNEL LEARNING
topic AUTOENCODER
CANCER GENOMICS
KERNEL LEARNING
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Background: Next generation sequencing instruments are providing new opportunities for comprehensive analyses of cancer genomes. The increasing availability of tumor data allows to research the complexity of cancer disease with machine learning methods. The large available repositories of high dimensional tumor samples characterised with germline and somatic mutation data requires advance computational modelling for data interpretation. In this work, we propose to analyze this complex data with neural network learning, a methodology that made impressive advances in image and natural language processing. Results: Here we present a tumor mutation profile analysis pipeline based on an autoencoder model, which is used to discover better representations of lower dimensionality from large somatic mutation data of 40 different tumor types and subtypes. Kernel learning with hierarchical cluster analysis are used to assess the quality of the learned somatic mutation embedding, on which support vector machine models are used to accurately classify tumor subtypes. Conclusions: The learned latent space maps the original samples in a much lower dimension while keeping the biological signals from the original tumor samples. This pipeline and the resulting embedding allows an easier exploration of the heterogeneity within and across tumor types and to perform an accurate classification of tumor samples in the pan-cancer somatic mutation landscape.
Fil: Palazzo, Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentina. Universidad Tecnológica Nacional; Argentina
Fil: Beauseroy, Pierre. Université de Technologie de Troyes; Francia
Fil: Yankilevich, Patricio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentina
description Background: Next generation sequencing instruments are providing new opportunities for comprehensive analyses of cancer genomes. The increasing availability of tumor data allows to research the complexity of cancer disease with machine learning methods. The large available repositories of high dimensional tumor samples characterised with germline and somatic mutation data requires advance computational modelling for data interpretation. In this work, we propose to analyze this complex data with neural network learning, a methodology that made impressive advances in image and natural language processing. Results: Here we present a tumor mutation profile analysis pipeline based on an autoencoder model, which is used to discover better representations of lower dimensionality from large somatic mutation data of 40 different tumor types and subtypes. Kernel learning with hierarchical cluster analysis are used to assess the quality of the learned somatic mutation embedding, on which support vector machine models are used to accurately classify tumor subtypes. Conclusions: The learned latent space maps the original samples in a much lower dimension while keeping the biological signals from the original tumor samples. This pipeline and the resulting embedding allows an easier exploration of the heterogeneity within and across tumor types and to perform an accurate classification of tumor samples in the pan-cancer somatic mutation landscape.
publishDate 2019
dc.date.none.fl_str_mv 2019-12
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/124571
Palazzo, Martin; Beauseroy, Pierre; Yankilevich, Patricio; A Pan-cancer Somatic Mutation Embedding using Autoencoders; BioMed Central; BMC Bioinformatics; 20; 1; 12-2019; 1-10
1471-2105
CONICET Digital
CONICET
url http://hdl.handle.net/11336/124571
identifier_str_mv Palazzo, Martin; Beauseroy, Pierre; Yankilevich, Patricio; A Pan-cancer Somatic Mutation Embedding using Autoencoders; BioMed Central; BMC Bioinformatics; 20; 1; 12-2019; 1-10
1471-2105
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3298-z
info:eu-repo/semantics/altIdentifier/doi/10.1186/s12859-019-3298-z
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv BioMed Central
publisher.none.fl_str_mv BioMed Central
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1846083314198249472
score 13.22299