Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction
- Autores
- Milone, Diego Humberto; Stegmayer, Georgina; Kamenetzky, Laura; Lopez, Mariana Gabriela; Carrari, Fernando Oscar
- Año de publicación
- 2013
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Dimensional reduction is a widely used technique for exploratory analysis of large volume of data. In biological datasets, each object is described by a large number of variables (or dimensions) and it is crucial to perform their analyses in a smaller space, to extract useful information. Kohonen self-organizing maps (SOMs) have been recently proposed in systems biology as a useful tool for exploratory analysis, data integration and discovery of new relationships in*omics datasets. SOMs have been traditionally used for clustering in several data mining problems, mainly due to their ability to preserve input data topology and reduce a high dimensional input space into a 2-D map. In spite of this, the above-mentioned dimensional reduction can lead to counterintuitive results. Sometimes, maps having almost the same size, trained on the same dataset, and with identical learning algorithms and parameters, may find different clusters. However, one would expect that small changes in map sizes or another training condition would not result in an abrupt different location of any of the grouped patterns. The aim of this work is to analyze and explain this issue through a real case study involving transcriptomic and metabolomic data, since it might have an important impact when interpreting clustering results over a biological dataset.
Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hidricas. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional; Argentina
Fil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hidricas. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional; Argentina. Centro de Investigación en Ingeniería en Sistemas de Información; Argentina
Fil: Kamenetzky, Laura. Instituto Nacional de Tecnología Agropecuaria; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; Argentina
Fil: Lopez, Mariana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Instituto Nacional de Tecnología Agropecuaria; Argentina
Fil: Carrari, Fernando Oscar. Instituto Nacional de Tecnología Agropecuaria; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina - Materia
-
BIOINFORMATICS
CLUSTERING
DIMENSIONAL REDUCTION
TOPOLOGY PRESERVATION - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/18094
Ver los metadatos del registro completo
id |
CONICETDig_e68beb11c5ce6a7eb821e7d3945de6aa |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/18094 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Clustering biological data with SOMs: on topology preservation in non-linear dimensional reductionMilone, Diego HumbertoStegmayer, GeorginaKamenetzky, LauraLopez, Mariana GabrielaCarrari, Fernando OscarBIOINFORMATICSCLUSTERINGDIMENSIONAL REDUCTIONTOPOLOGY PRESERVATIONhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Dimensional reduction is a widely used technique for exploratory analysis of large volume of data. In biological datasets, each object is described by a large number of variables (or dimensions) and it is crucial to perform their analyses in a smaller space, to extract useful information. Kohonen self-organizing maps (SOMs) have been recently proposed in systems biology as a useful tool for exploratory analysis, data integration and discovery of new relationships in*omics datasets. SOMs have been traditionally used for clustering in several data mining problems, mainly due to their ability to preserve input data topology and reduce a high dimensional input space into a 2-D map. In spite of this, the above-mentioned dimensional reduction can lead to counterintuitive results. Sometimes, maps having almost the same size, trained on the same dataset, and with identical learning algorithms and parameters, may find different clusters. However, one would expect that small changes in map sizes or another training condition would not result in an abrupt different location of any of the grouped patterns. The aim of this work is to analyze and explain this issue through a real case study involving transcriptomic and metabolomic data, since it might have an important impact when interpreting clustering results over a biological dataset.Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hidricas. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional; ArgentinaFil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hidricas. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional; Argentina. Centro de Investigación en Ingeniería en Sistemas de Información; ArgentinaFil: Kamenetzky, Laura. Instituto Nacional de Tecnología Agropecuaria; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Lopez, Mariana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Instituto Nacional de Tecnología Agropecuaria; ArgentinaFil: Carrari, Fernando Oscar. Instituto Nacional de Tecnología Agropecuaria; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaPergamon-Elsevier Science Ltd2013-07info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/18094Milone, Diego Humberto; Stegmayer, Georgina; Kamenetzky, Laura; Lopez, Mariana Gabriela; Carrari, Fernando Oscar; Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction; Pergamon-Elsevier Science Ltd; Expert Systems with Applications; 40; 9; 7-2013; 3841-38450957-4174enginfo:eu-repo/semantics/altIdentifier/doi/10.1016/j.eswa.2012.12.074info:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0957417412013152?via%3Dihubinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:41:35Zoai:ri.conicet.gov.ar:11336/18094instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:41:35.594CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction |
title |
Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction |
spellingShingle |
Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction Milone, Diego Humberto BIOINFORMATICS CLUSTERING DIMENSIONAL REDUCTION TOPOLOGY PRESERVATION |
title_short |
Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction |
title_full |
Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction |
title_fullStr |
Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction |
title_full_unstemmed |
Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction |
title_sort |
Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction |
dc.creator.none.fl_str_mv |
Milone, Diego Humberto Stegmayer, Georgina Kamenetzky, Laura Lopez, Mariana Gabriela Carrari, Fernando Oscar |
author |
Milone, Diego Humberto |
author_facet |
Milone, Diego Humberto Stegmayer, Georgina Kamenetzky, Laura Lopez, Mariana Gabriela Carrari, Fernando Oscar |
author_role |
author |
author2 |
Stegmayer, Georgina Kamenetzky, Laura Lopez, Mariana Gabriela Carrari, Fernando Oscar |
author2_role |
author author author author |
dc.subject.none.fl_str_mv |
BIOINFORMATICS CLUSTERING DIMENSIONAL REDUCTION TOPOLOGY PRESERVATION |
topic |
BIOINFORMATICS CLUSTERING DIMENSIONAL REDUCTION TOPOLOGY PRESERVATION |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
Dimensional reduction is a widely used technique for exploratory analysis of large volume of data. In biological datasets, each object is described by a large number of variables (or dimensions) and it is crucial to perform their analyses in a smaller space, to extract useful information. Kohonen self-organizing maps (SOMs) have been recently proposed in systems biology as a useful tool for exploratory analysis, data integration and discovery of new relationships in*omics datasets. SOMs have been traditionally used for clustering in several data mining problems, mainly due to their ability to preserve input data topology and reduce a high dimensional input space into a 2-D map. In spite of this, the above-mentioned dimensional reduction can lead to counterintuitive results. Sometimes, maps having almost the same size, trained on the same dataset, and with identical learning algorithms and parameters, may find different clusters. However, one would expect that small changes in map sizes or another training condition would not result in an abrupt different location of any of the grouped patterns. The aim of this work is to analyze and explain this issue through a real case study involving transcriptomic and metabolomic data, since it might have an important impact when interpreting clustering results over a biological dataset. Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hidricas. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional; Argentina Fil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hidricas. Instituto de Investigación En Señales, Sistemas E Inteligencia Computacional; Argentina. Centro de Investigación en Ingeniería en Sistemas de Información; Argentina Fil: Kamenetzky, Laura. Instituto Nacional de Tecnología Agropecuaria; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; Argentina Fil: Lopez, Mariana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Instituto Nacional de Tecnología Agropecuaria; Argentina Fil: Carrari, Fernando Oscar. Instituto Nacional de Tecnología Agropecuaria; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina |
description |
Dimensional reduction is a widely used technique for exploratory analysis of large volume of data. In biological datasets, each object is described by a large number of variables (or dimensions) and it is crucial to perform their analyses in a smaller space, to extract useful information. Kohonen self-organizing maps (SOMs) have been recently proposed in systems biology as a useful tool for exploratory analysis, data integration and discovery of new relationships in*omics datasets. SOMs have been traditionally used for clustering in several data mining problems, mainly due to their ability to preserve input data topology and reduce a high dimensional input space into a 2-D map. In spite of this, the above-mentioned dimensional reduction can lead to counterintuitive results. Sometimes, maps having almost the same size, trained on the same dataset, and with identical learning algorithms and parameters, may find different clusters. However, one would expect that small changes in map sizes or another training condition would not result in an abrupt different location of any of the grouped patterns. The aim of this work is to analyze and explain this issue through a real case study involving transcriptomic and metabolomic data, since it might have an important impact when interpreting clustering results over a biological dataset. |
publishDate |
2013 |
dc.date.none.fl_str_mv |
2013-07 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/18094 Milone, Diego Humberto; Stegmayer, Georgina; Kamenetzky, Laura; Lopez, Mariana Gabriela; Carrari, Fernando Oscar; Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction; Pergamon-Elsevier Science Ltd; Expert Systems with Applications; 40; 9; 7-2013; 3841-3845 0957-4174 |
url |
http://hdl.handle.net/11336/18094 |
identifier_str_mv |
Milone, Diego Humberto; Stegmayer, Georgina; Kamenetzky, Laura; Lopez, Mariana Gabriela; Carrari, Fernando Oscar; Clustering biological data with SOMs: on topology preservation in non-linear dimensional reduction; Pergamon-Elsevier Science Ltd; Expert Systems with Applications; 40; 9; 7-2013; 3841-3845 0957-4174 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/doi/10.1016/j.eswa.2012.12.074 info:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0957417412013152?via%3Dihub |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf application/pdf application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Pergamon-Elsevier Science Ltd |
publisher.none.fl_str_mv |
Pergamon-Elsevier Science Ltd |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844613312105414656 |
score |
13.070432 |