A Pan-cancer Somatic Mutation Embedding using Autoencoders
- Autores
- Palazzo, Martin; Beauseroy, Pierre; Yankilevich, Patricio
- Año de publicación
- 2019
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Background: Next generation sequencing instruments are providing new opportunities for comprehensive analyses of cancer genomes. The increasing availability of tumor data allows to research the complexity of cancer disease with machine learning methods. The large available repositories of high dimensional tumor samples characterised with germline and somatic mutation data requires advance computational modelling for data interpretation. In this work, we propose to analyze this complex data with neural network learning, a methodology that made impressive advances in image and natural language processing. Results: Here we present a tumor mutation profile analysis pipeline based on an autoencoder model, which is used to discover better representations of lower dimensionality from large somatic mutation data of 40 different tumor types and subtypes. Kernel learning with hierarchical cluster analysis are used to assess the quality of the learned somatic mutation embedding, on which support vector machine models are used to accurately classify tumor subtypes. Conclusions: The learned latent space maps the original samples in a much lower dimension while keeping the biological signals from the original tumor samples. This pipeline and the resulting embedding allows an easier exploration of the heterogeneity within and across tumor types and to perform an accurate classification of tumor samples in the pan-cancer somatic mutation landscape.
Fil: Palazzo, Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentina. Universidad Tecnológica Nacional; Argentina
Fil: Beauseroy, Pierre. Université de Technologie de Troyes; Francia
Fil: Yankilevich, Patricio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentina - Materia
-
AUTOENCODER
CANCER GENOMICS
KERNEL LEARNING - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/124571
Ver los metadatos del registro completo
id |
CONICETDig_2bf8f75895656d319ad181f12b90720b |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/124571 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
A Pan-cancer Somatic Mutation Embedding using AutoencodersPalazzo, MartinBeauseroy, PierreYankilevich, PatricioAUTOENCODERCANCER GENOMICSKERNEL LEARNINGhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Background: Next generation sequencing instruments are providing new opportunities for comprehensive analyses of cancer genomes. The increasing availability of tumor data allows to research the complexity of cancer disease with machine learning methods. The large available repositories of high dimensional tumor samples characterised with germline and somatic mutation data requires advance computational modelling for data interpretation. In this work, we propose to analyze this complex data with neural network learning, a methodology that made impressive advances in image and natural language processing. Results: Here we present a tumor mutation profile analysis pipeline based on an autoencoder model, which is used to discover better representations of lower dimensionality from large somatic mutation data of 40 different tumor types and subtypes. Kernel learning with hierarchical cluster analysis are used to assess the quality of the learned somatic mutation embedding, on which support vector machine models are used to accurately classify tumor subtypes. Conclusions: The learned latent space maps the original samples in a much lower dimension while keeping the biological signals from the original tumor samples. This pipeline and the resulting embedding allows an easier exploration of the heterogeneity within and across tumor types and to perform an accurate classification of tumor samples in the pan-cancer somatic mutation landscape.Fil: Palazzo, Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentina. Universidad Tecnológica Nacional; ArgentinaFil: Beauseroy, Pierre. Université de Technologie de Troyes; FranciaFil: Yankilevich, Patricio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; ArgentinaBioMed Central2019-12info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/124571Palazzo, Martin; Beauseroy, Pierre; Yankilevich, Patricio; A Pan-cancer Somatic Mutation Embedding using Autoencoders; BioMed Central; BMC Bioinformatics; 20; 1; 12-2019; 1-101471-2105CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3298-zinfo:eu-repo/semantics/altIdentifier/doi/10.1186/s12859-019-3298-zinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-10-15T15:16:33Zoai:ri.conicet.gov.ar:11336/124571instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-10-15 15:16:34.252CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
A Pan-cancer Somatic Mutation Embedding using Autoencoders |
title |
A Pan-cancer Somatic Mutation Embedding using Autoencoders |
spellingShingle |
A Pan-cancer Somatic Mutation Embedding using Autoencoders Palazzo, Martin AUTOENCODER CANCER GENOMICS KERNEL LEARNING |
title_short |
A Pan-cancer Somatic Mutation Embedding using Autoencoders |
title_full |
A Pan-cancer Somatic Mutation Embedding using Autoencoders |
title_fullStr |
A Pan-cancer Somatic Mutation Embedding using Autoencoders |
title_full_unstemmed |
A Pan-cancer Somatic Mutation Embedding using Autoencoders |
title_sort |
A Pan-cancer Somatic Mutation Embedding using Autoencoders |
dc.creator.none.fl_str_mv |
Palazzo, Martin Beauseroy, Pierre Yankilevich, Patricio |
author |
Palazzo, Martin |
author_facet |
Palazzo, Martin Beauseroy, Pierre Yankilevich, Patricio |
author_role |
author |
author2 |
Beauseroy, Pierre Yankilevich, Patricio |
author2_role |
author author |
dc.subject.none.fl_str_mv |
AUTOENCODER CANCER GENOMICS KERNEL LEARNING |
topic |
AUTOENCODER CANCER GENOMICS KERNEL LEARNING |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
Background: Next generation sequencing instruments are providing new opportunities for comprehensive analyses of cancer genomes. The increasing availability of tumor data allows to research the complexity of cancer disease with machine learning methods. The large available repositories of high dimensional tumor samples characterised with germline and somatic mutation data requires advance computational modelling for data interpretation. In this work, we propose to analyze this complex data with neural network learning, a methodology that made impressive advances in image and natural language processing. Results: Here we present a tumor mutation profile analysis pipeline based on an autoencoder model, which is used to discover better representations of lower dimensionality from large somatic mutation data of 40 different tumor types and subtypes. Kernel learning with hierarchical cluster analysis are used to assess the quality of the learned somatic mutation embedding, on which support vector machine models are used to accurately classify tumor subtypes. Conclusions: The learned latent space maps the original samples in a much lower dimension while keeping the biological signals from the original tumor samples. This pipeline and the resulting embedding allows an easier exploration of the heterogeneity within and across tumor types and to perform an accurate classification of tumor samples in the pan-cancer somatic mutation landscape. Fil: Palazzo, Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentina. Universidad Tecnológica Nacional; Argentina Fil: Beauseroy, Pierre. Université de Technologie de Troyes; Francia Fil: Yankilevich, Patricio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentina |
description |
Background: Next generation sequencing instruments are providing new opportunities for comprehensive analyses of cancer genomes. The increasing availability of tumor data allows to research the complexity of cancer disease with machine learning methods. The large available repositories of high dimensional tumor samples characterised with germline and somatic mutation data requires advance computational modelling for data interpretation. In this work, we propose to analyze this complex data with neural network learning, a methodology that made impressive advances in image and natural language processing. Results: Here we present a tumor mutation profile analysis pipeline based on an autoencoder model, which is used to discover better representations of lower dimensionality from large somatic mutation data of 40 different tumor types and subtypes. Kernel learning with hierarchical cluster analysis are used to assess the quality of the learned somatic mutation embedding, on which support vector machine models are used to accurately classify tumor subtypes. Conclusions: The learned latent space maps the original samples in a much lower dimension while keeping the biological signals from the original tumor samples. This pipeline and the resulting embedding allows an easier exploration of the heterogeneity within and across tumor types and to perform an accurate classification of tumor samples in the pan-cancer somatic mutation landscape. |
publishDate |
2019 |
dc.date.none.fl_str_mv |
2019-12 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/124571 Palazzo, Martin; Beauseroy, Pierre; Yankilevich, Patricio; A Pan-cancer Somatic Mutation Embedding using Autoencoders; BioMed Central; BMC Bioinformatics; 20; 1; 12-2019; 1-10 1471-2105 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/124571 |
identifier_str_mv |
Palazzo, Martin; Beauseroy, Pierre; Yankilevich, Patricio; A Pan-cancer Somatic Mutation Embedding using Autoencoders; BioMed Central; BMC Bioinformatics; 20; 1; 12-2019; 1-10 1471-2105 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3298-z info:eu-repo/semantics/altIdentifier/doi/10.1186/s12859-019-3298-z |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
BioMed Central |
publisher.none.fl_str_mv |
BioMed Central |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1846083314198249472 |
score |
13.22299 |