Improving clustering with metabolic pathway data

Autores: Milone, Diego Humberto; Stegmayer, Georgina; Lopez, Mariana Gabriela; Kamenetzky, Laura; Carrari, Fernando
Año de publicación: 2014
Idioma: inglés
Tipo de recurso: artículo
Estado: versión publicada
Descripción: Background: It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters. Results: A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view. Conclusions: Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis.
Instituto de Biotecnología
Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Lopez, Mariana Gabriela. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina.
Fil: Kamenetzky, Laura. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina.
Fil: Carrari, Fernando Oscar. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina.
Fuente: BMC Bioinformatics 15 : 101 (2014)
Materia: Bioinformática
Datos
Bioinformatics
Data
Agrupamiento
Clustering
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
Institución: Instituto Nacional de Tecnología Agropecuaria
OAI Identificador: oai:localhost:20.500.12123/4292

Acceder

id	INTADig_1e2e70f167efb3968cc02d4ffaaaf401
oai_identifier_str	oai:localhost:20.500.12123/4292
network_acronym_str	INTADig
repository_id_str	l
network_name_str	INTA Digital (INTA)
spelling	Improving clustering with metabolic pathway dataMilone, Diego HumbertoStegmayer, GeorginaLopez, Mariana GabrielaKamenetzky, LauraCarrari, FernandoBioinformáticaDatosBioinformaticsDataAgrupamientoClusteringBackground: It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters. Results: A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view. Conclusions: Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis.Instituto de BiotecnologíaFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Lopez, Mariana Gabriela. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina.Fil: Kamenetzky, Laura. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina.Fil: Carrari, Fernando Oscar. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina.BMC2019-01-18T12:45:32Z2019-01-18T12:45:32Z2014-04info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfhttps://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-101http://hdl.handle.net/20.500.12123/42921471-2105https://doi.org/10.1186/1471-2105-15-101BMC Bioinformatics 15 : 101 (2014)reponame:INTA Digital (INTA)instname:Instituto Nacional de Tecnología Agropecuariaenginfo:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)2026-02-26T11:43:59Zoai:localhost:20.500.12123/4292instacron:INTAInstitucionalhttp://repositorio.inta.gob.ar/Organismo científico-tecnológicoNo correspondehttp://repositorio.inta.gob.ar/oai/requesttripaldi.nicolas@inta.gob.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:l2026-02-26 11:44:00.293INTA Digital (INTA) - Instituto Nacional de Tecnología Agropecuariafalse
dc.title.none.fl_str_mv	Improving clustering with metabolic pathway data
title	Improving clustering with metabolic pathway data
spellingShingle	Improving clustering with metabolic pathway data Milone, Diego Humberto Bioinformática Datos Bioinformatics Data Agrupamiento Clustering
title_short	Improving clustering with metabolic pathway data
title_full	Improving clustering with metabolic pathway data
title_fullStr	Improving clustering with metabolic pathway data
title_full_unstemmed	Improving clustering with metabolic pathway data
title_sort	Improving clustering with metabolic pathway data
dc.creator.none.fl_str_mv	Milone, Diego Humberto Stegmayer, Georgina Lopez, Mariana Gabriela Kamenetzky, Laura Carrari, Fernando
author	Milone, Diego Humberto
author_facet	Milone, Diego Humberto Stegmayer, Georgina Lopez, Mariana Gabriela Kamenetzky, Laura Carrari, Fernando
author_role	author
author2	Stegmayer, Georgina Lopez, Mariana Gabriela Kamenetzky, Laura Carrari, Fernando
author2_role	author author author author
dc.subject.none.fl_str_mv	Bioinformática Datos Bioinformatics Data Agrupamiento Clustering
topic	Bioinformática Datos Bioinformatics Data Agrupamiento Clustering
dc.description.none.fl_txt_mv	Background: It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters. Results: A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view. Conclusions: Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis. Instituto de Biotecnología Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina Fil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina Fil: Lopez, Mariana Gabriela. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Fil: Kamenetzky, Laura. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Fil: Carrari, Fernando Oscar. Instituto Nacional de Tecnología Agropecuaria (INTA). Instituto de Biotecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina.
description	Background: It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters. Results: A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view. Conclusions: Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis.
publishDate	2014
dc.date.none.fl_str_mv	2014-04 2019-01-18T12:45:32Z 2019-01-18T12:45:32Z
dc.type.none.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo
format	article
status_str	publishedVersion
dc.identifier.none.fl_str_mv	https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-101 http://hdl.handle.net/20.500.12123/4292 1471-2105 https://doi.org/10.1186/1471-2105-15-101
url	https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-101 http://hdl.handle.net/20.500.12123/4292 https://doi.org/10.1186/1471-2105-15-101
identifier_str_mv	1471-2105
dc.language.none.fl_str_mv	eng
language	eng
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	BMC
publisher.none.fl_str_mv	BMC
dc.source.none.fl_str_mv	BMC Bioinformatics 15 : 101 (2014) reponame:INTA Digital (INTA) instname:Instituto Nacional de Tecnología Agropecuaria
reponame_str	INTA Digital (INTA)
collection	INTA Digital (INTA)
instname_str	Instituto Nacional de Tecnología Agropecuaria
repository.name.fl_str_mv	INTA Digital (INTA) - Instituto Nacional de Tecnología Agropecuaria
repository.mail.fl_str_mv	tripaldi.nicolas@inta.gob.ar
_version_	1858207851349016576
score	13.176822

Improving clustering with metabolic pathway data

Publicaciones similares