Clustering using PK-D: A connectivity and density dissimilarity

Autores
Baya, Ariel Emilio; Larese, Monica Graciela; Granitto, Pablo Miguel
Año de publicación
2016
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
We present a new dissimilarity, which combines connectivity and density information. Usually, connectivity and density are conceived as mutually exclusive concepts; however, we discuss a novel procedure to merge both information sources. Once we have calculated the new dissimilarity, we apply MDS in order to find a low dimensional vector space representation. The new data representation can be used for clustering and data visualization, which is not pursued in this paper. Instead we use clustering to estimate the gain from our approach consisting of dissimilarity + MDS. Hence, we analyze the partitions' quality obtained by clustering high dimensional data with various well known clustering algorithms based on density, connectivity and message passing, as well as simple algorithms like k-means and Hierarchical Clustering (HC). The quality gap between the partitions found by k-means and HC alone compared to k-means and HC using our new low dimensional vector space representation is remarkable. Moreover, our tests using high dimensional gene expression and image data confirm these results and show a steady performance, which surpasses spectral clustering and other algorithms relevant to our work.
Fil: Baya, Ariel Emilio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina
Fil: Larese, Monica Graciela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina
Fil: Granitto, Pablo Miguel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina
Materia
Clustering
Dimensionality Reduction
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/52658

id CONICETDig_4062cf34a41a9e06e7a8fb46e8a58f3e
oai_identifier_str oai:ri.conicet.gov.ar:11336/52658
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Clustering using PK-D: A connectivity and density dissimilarityBaya, Ariel EmilioLarese, Monica GracielaGranitto, Pablo MiguelClusteringDimensionality Reductionhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1We present a new dissimilarity, which combines connectivity and density information. Usually, connectivity and density are conceived as mutually exclusive concepts; however, we discuss a novel procedure to merge both information sources. Once we have calculated the new dissimilarity, we apply MDS in order to find a low dimensional vector space representation. The new data representation can be used for clustering and data visualization, which is not pursued in this paper. Instead we use clustering to estimate the gain from our approach consisting of dissimilarity + MDS. Hence, we analyze the partitions' quality obtained by clustering high dimensional data with various well known clustering algorithms based on density, connectivity and message passing, as well as simple algorithms like k-means and Hierarchical Clustering (HC). The quality gap between the partitions found by k-means and HC alone compared to k-means and HC using our new low dimensional vector space representation is remarkable. Moreover, our tests using high dimensional gene expression and image data confirm these results and show a steady performance, which surpasses spectral clustering and other algorithms relevant to our work.Fil: Baya, Ariel Emilio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Larese, Monica Graciela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Granitto, Pablo Miguel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaPergamon-Elsevier Science Ltd2016-06info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/zipapplication/zipapplication/pdfhttp://hdl.handle.net/11336/52658Baya, Ariel Emilio; Larese, Monica Graciela; Granitto, Pablo Miguel; Clustering using PK-D: A connectivity and density dissimilarity; Pergamon-Elsevier Science Ltd; Expert Systems with Applications; 51; 6-2016; 151-1600957-4174CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1016/j.eswa.2015.12.037info:eu-repo/semantics/altIdentifier/url/https://www.sciencedirect.com/science/article/pii/S0957417415008453info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:58:40Zoai:ri.conicet.gov.ar:11336/52658instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:58:41.239CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Clustering using PK-D: A connectivity and density dissimilarity
title Clustering using PK-D: A connectivity and density dissimilarity
spellingShingle Clustering using PK-D: A connectivity and density dissimilarity
Baya, Ariel Emilio
Clustering
Dimensionality Reduction
title_short Clustering using PK-D: A connectivity and density dissimilarity
title_full Clustering using PK-D: A connectivity and density dissimilarity
title_fullStr Clustering using PK-D: A connectivity and density dissimilarity
title_full_unstemmed Clustering using PK-D: A connectivity and density dissimilarity
title_sort Clustering using PK-D: A connectivity and density dissimilarity
dc.creator.none.fl_str_mv Baya, Ariel Emilio
Larese, Monica Graciela
Granitto, Pablo Miguel
author Baya, Ariel Emilio
author_facet Baya, Ariel Emilio
Larese, Monica Graciela
Granitto, Pablo Miguel
author_role author
author2 Larese, Monica Graciela
Granitto, Pablo Miguel
author2_role author
author
dc.subject.none.fl_str_mv Clustering
Dimensionality Reduction
topic Clustering
Dimensionality Reduction
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv We present a new dissimilarity, which combines connectivity and density information. Usually, connectivity and density are conceived as mutually exclusive concepts; however, we discuss a novel procedure to merge both information sources. Once we have calculated the new dissimilarity, we apply MDS in order to find a low dimensional vector space representation. The new data representation can be used for clustering and data visualization, which is not pursued in this paper. Instead we use clustering to estimate the gain from our approach consisting of dissimilarity + MDS. Hence, we analyze the partitions' quality obtained by clustering high dimensional data with various well known clustering algorithms based on density, connectivity and message passing, as well as simple algorithms like k-means and Hierarchical Clustering (HC). The quality gap between the partitions found by k-means and HC alone compared to k-means and HC using our new low dimensional vector space representation is remarkable. Moreover, our tests using high dimensional gene expression and image data confirm these results and show a steady performance, which surpasses spectral clustering and other algorithms relevant to our work.
Fil: Baya, Ariel Emilio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina
Fil: Larese, Monica Graciela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina
Fil: Granitto, Pablo Miguel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina
description We present a new dissimilarity, which combines connectivity and density information. Usually, connectivity and density are conceived as mutually exclusive concepts; however, we discuss a novel procedure to merge both information sources. Once we have calculated the new dissimilarity, we apply MDS in order to find a low dimensional vector space representation. The new data representation can be used for clustering and data visualization, which is not pursued in this paper. Instead we use clustering to estimate the gain from our approach consisting of dissimilarity + MDS. Hence, we analyze the partitions' quality obtained by clustering high dimensional data with various well known clustering algorithms based on density, connectivity and message passing, as well as simple algorithms like k-means and Hierarchical Clustering (HC). The quality gap between the partitions found by k-means and HC alone compared to k-means and HC using our new low dimensional vector space representation is remarkable. Moreover, our tests using high dimensional gene expression and image data confirm these results and show a steady performance, which surpasses spectral clustering and other algorithms relevant to our work.
publishDate 2016
dc.date.none.fl_str_mv 2016-06
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/52658
Baya, Ariel Emilio; Larese, Monica Graciela; Granitto, Pablo Miguel; Clustering using PK-D: A connectivity and density dissimilarity; Pergamon-Elsevier Science Ltd; Expert Systems with Applications; 51; 6-2016; 151-160
0957-4174
CONICET Digital
CONICET
url http://hdl.handle.net/11336/52658
identifier_str_mv Baya, Ariel Emilio; Larese, Monica Graciela; Granitto, Pablo Miguel; Clustering using PK-D: A connectivity and density dissimilarity; Pergamon-Elsevier Science Ltd; Expert Systems with Applications; 51; 6-2016; 151-160
0957-4174
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/10.1016/j.eswa.2015.12.037
info:eu-repo/semantics/altIdentifier/url/https://www.sciencedirect.com/science/article/pii/S0957417415008453
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/zip
application/zip
application/pdf
dc.publisher.none.fl_str_mv Pergamon-Elsevier Science Ltd
publisher.none.fl_str_mv Pergamon-Elsevier Science Ltd
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844613747482558464
score 13.070432