Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction

Autores
Milone, Diego Humberto; Stegmayer, Georgina; Kamenetzky, Laura; Lopez, Mariana Gabriela; Carrari, Fernando Oscar
Año de publicación
2013
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Dimensional reduction is a widely used technique for exploratory analysis of large volume of data. In biological datasets, each object is described by a large number of variables (or dimensions) and it is crucial to perform their analyses in a smaller space, to extract useful information. Kohonen self-organizing maps (SOMs) have been recently proposed in systems biology as a useful tool for exploratory analysis, data integration and discovery of new relationships in*omics datasets. SOMs have been traditionally used for clustering in several data mining problems, mainly due to their ability to preserve input data topology and reduce a high dimensional input space into a 2-D map. In spite of this, the above-mentioned dimensional reduction can lead to counterintuitive results. Sometimes, maps having almost the same size, trained on the same dataset, and with identical learning algorithms and parameters, may find different clusters. However, one would expect that small changes in map sizes or another training condition would not result in an abrupt different location of any of the grouped patterns. The aim of this work is to analyze and explain this issue through a real case study involving transcriptomic and metabolomic data, since it might have an important impact when interpreting clustering results over a biological dataset.
Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hidricas. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional; Argentina
Fil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hidricas. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional; Argentina. Centro de Investigación en Ingeniería en Sistemas de Información; Argentina
Fil: Kamenetzky, Laura. Instituto Nacional de Tecnología Agropecuaria; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; Argentina
Fil: Lopez, Mariana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Instituto Nacional de Tecnología Agropecuaria; Argentina
Fil: Carrari, Fernando Oscar. Instituto Nacional de Tecnología Agropecuaria; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Materia
BIOINFORMATICS
CLUSTERING
DIMENSIONAL REDUCTION
TOPOLOGY PRESERVATION
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/18094

id CONICETDig_e68beb11c5ce6a7eb821e7d3945de6aa
oai_identifier_str oai:ri.conicet.gov.ar:11336/18094
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Clustering biological data with SOMs: on topology preservation in non-linear dimensional reductionMilone, Diego HumbertoStegmayer, GeorginaKamenetzky, LauraLopez, Mariana GabrielaCarrari, Fernando OscarBIOINFORMATICSCLUSTERINGDIMENSIONAL REDUCTIONTOPOLOGY PRESERVATIONhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Dimensional reduction is a widely used technique for exploratory analysis of large volume of data. In biological datasets, each object is described by a large number of variables (or dimensions) and it is crucial to perform their analyses in a smaller space, to extract useful information. Kohonen self-organizing maps (SOMs) have been recently proposed in systems biology as a useful tool for exploratory analysis, data integration and discovery of new relationships in*omics datasets. SOMs have been traditionally used for clustering in several data mining problems, mainly due to their ability to preserve input data topology and reduce a high dimensional input space into a 2-D map. In spite of this, the above-mentioned dimensional reduction can lead to counterintuitive results. Sometimes, maps having almost the same size, trained on the same dataset, and with identical learning algorithms and parameters, may find different clusters. However, one would expect that small changes in map sizes or another training condition would not result in an abrupt different location of any of the grouped patterns. The aim of this work is to analyze and explain this issue through a real case study involving transcriptomic and metabolomic data, since it might have an important impact when interpreting clustering results over a biological dataset.Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hidricas. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional; ArgentinaFil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hidricas. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional; Argentina. Centro de Investigación en Ingeniería en Sistemas de Información; ArgentinaFil: Kamenetzky, Laura. Instituto Nacional de Tecnología Agropecuaria; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Lopez, Mariana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Instituto Nacional de Tecnología Agropecuaria; ArgentinaFil: Carrari, Fernando Oscar. Instituto Nacional de Tecnología Agropecuaria; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaPergamon-Elsevier Science Ltd2013-07info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/18094Milone, Diego Humberto; Stegmayer, Georgina; Kamenetzky, Laura; Lopez, Mariana Gabriela; Carrari, Fernando Oscar; Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction; Pergamon-Elsevier Science Ltd; Expert Systems with Applications; 40; 9; 7-2013; 3841-38450957-4174enginfo:eu-repo/semantics/altIdentifier/doi/10.1016/j.eswa.2012.12.074info:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0957417412013152?via%3Dihubinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:41:35Zoai:ri.conicet.gov.ar:11336/18094instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:41:35.594CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction
title Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction
spellingShingle Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction
Milone, Diego Humberto
BIOINFORMATICS
CLUSTERING
DIMENSIONAL REDUCTION
TOPOLOGY PRESERVATION
title_short Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction
title_full Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction
title_fullStr Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction
title_full_unstemmed Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction
title_sort Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction
dc.creator.none.fl_str_mv Milone, Diego Humberto
Stegmayer, Georgina
Kamenetzky, Laura
Lopez, Mariana Gabriela
Carrari, Fernando Oscar
author Milone, Diego Humberto
author_facet Milone, Diego Humberto
Stegmayer, Georgina
Kamenetzky, Laura
Lopez, Mariana Gabriela
Carrari, Fernando Oscar
author_role author
author2 Stegmayer, Georgina
Kamenetzky, Laura
Lopez, Mariana Gabriela
Carrari, Fernando Oscar
author2_role author
author
author
author
dc.subject.none.fl_str_mv BIOINFORMATICS
CLUSTERING
DIMENSIONAL REDUCTION
TOPOLOGY PRESERVATION
topic BIOINFORMATICS
CLUSTERING
DIMENSIONAL REDUCTION
TOPOLOGY PRESERVATION
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Dimensional reduction is a widely used technique for exploratory analysis of large volume of data. In biological datasets, each object is described by a large number of variables (or dimensions) and it is crucial to perform their analyses in a smaller space, to extract useful information. Kohonen self-organizing maps (SOMs) have been recently proposed in systems biology as a useful tool for exploratory analysis, data integration and discovery of new relationships in*omics datasets. SOMs have been traditionally used for clustering in several data mining problems, mainly due to their ability to preserve input data topology and reduce a high dimensional input space into a 2-D map. In spite of this, the above-mentioned dimensional reduction can lead to counterintuitive results. Sometimes, maps having almost the same size, trained on the same dataset, and with identical learning algorithms and parameters, may find different clusters. However, one would expect that small changes in map sizes or another training condition would not result in an abrupt different location of any of the grouped patterns. The aim of this work is to analyze and explain this issue through a real case study involving transcriptomic and metabolomic data, since it might have an important impact when interpreting clustering results over a biological dataset.
Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hidricas. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional; Argentina
Fil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hidricas. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional; Argentina. Centro de Investigación en Ingeniería en Sistemas de Información; Argentina
Fil: Kamenetzky, Laura. Instituto Nacional de Tecnología Agropecuaria; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; Argentina
Fil: Lopez, Mariana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Instituto Nacional de Tecnología Agropecuaria; Argentina
Fil: Carrari, Fernando Oscar. Instituto Nacional de Tecnología Agropecuaria; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
description Dimensional reduction is a widely used technique for exploratory analysis of large volume of data. In biological datasets, each object is described by a large number of variables (or dimensions) and it is crucial to perform their analyses in a smaller space, to extract useful information. Kohonen self-organizing maps (SOMs) have been recently proposed in systems biology as a useful tool for exploratory analysis, data integration and discovery of new relationships in*omics datasets. SOMs have been traditionally used for clustering in several data mining problems, mainly due to their ability to preserve input data topology and reduce a high dimensional input space into a 2-D map. In spite of this, the above-mentioned dimensional reduction can lead to counterintuitive results. Sometimes, maps having almost the same size, trained on the same dataset, and with identical learning algorithms and parameters, may find different clusters. However, one would expect that small changes in map sizes or another training condition would not result in an abrupt different location of any of the grouped patterns. The aim of this work is to analyze and explain this issue through a real case study involving transcriptomic and metabolomic data, since it might have an important impact when interpreting clustering results over a biological dataset.
publishDate 2013
dc.date.none.fl_str_mv 2013-07
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/18094
Milone, Diego Humberto; Stegmayer, Georgina; Kamenetzky, Laura; Lopez, Mariana Gabriela; Carrari, Fernando Oscar; Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction; Pergamon-Elsevier Science Ltd; Expert Systems with Applications; 40; 9; 7-2013; 3841-3845
0957-4174
url http://hdl.handle.net/11336/18094
identifier_str_mv Milone, Diego Humberto; Stegmayer, Georgina; Kamenetzky, Laura; Lopez, Mariana Gabriela; Carrari, Fernando Oscar; Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction; Pergamon-Elsevier Science Ltd; Expert Systems with Applications; 40; 9; 7-2013; 3841-3845
0957-4174
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/10.1016/j.eswa.2012.12.074
info:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0957417412013152?via%3Dihub
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv Pergamon-Elsevier Science Ltd
publisher.none.fl_str_mv Pergamon-Elsevier Science Ltd
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844613312105414656
score 13.070432