How Many Clusters: A Validation Index for Arbitrary-Shaped Clusters

Autores
Baya, Ariel Emilio; Granitto, Pablo Miguel
Año de publicación
2013
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Clustering validation indexes are intended to assess the goodness of clustering results. Many methods used to estimate the number of clusters rely on a validation index as a key element to find the correct answer. This paper presents a new validation index based on graph concepts, which has been designed to find arbitrary shaped clusters by exploiting the spatial layout of the patterns and their clustering label. This new clustering index is combined with a solid statistical detection framework, the Gap Statistic. The resulting method is able to find the right number of arbitrary shaped clusters in diverse situations, as we show with examples where this information is available. A comparison with several relevant validation methods is carried out using artificial and gene expression datasets. The results are very encouraging, showing that the underlying structure in the data can be more accurately detected with the new clustering index. Our gene expression data results also indicate that this new index is stable under perturbation of the input data.
Fil: Baya, Ariel Emilio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico - Conicet - Rosario. Instituto Rosario de Investigaciones En Ciencias de la Educación; Argentina
Fil: Granitto, Pablo Miguel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico - Conicet - Rosario. Instituto Rosario de Investigaciones En Ciencias de la Educación; Argentina
Materia
CLUSTERING
GENOMIC DATA
VALIDATION INDEX
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/1459

id CONICETDig_c63088dc940c96e55212676509e97a0a
oai_identifier_str oai:ri.conicet.gov.ar:11336/1459
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling How Many Clusters: A Validation Index for Arbitrary-Shaped ClustersBaya, Ariel EmilioGranitto, Pablo MiguelCLUSTERINGGENOMIC DATAVALIDATION INDEXhttps://purl.org/becyt/ford/2.2https://purl.org/becyt/ford/2Clustering validation indexes are intended to assess the goodness of clustering results. Many methods used to estimate the number of clusters rely on a validation index as a key element to find the correct answer. This paper presents a new validation index based on graph concepts, which has been designed to find arbitrary shaped clusters by exploiting the spatial layout of the patterns and their clustering label. This new clustering index is combined with a solid statistical detection framework, the Gap Statistic. The resulting method is able to find the right number of arbitrary shaped clusters in diverse situations, as we show with examples where this information is available. A comparison with several relevant validation methods is carried out using artificial and gene expression datasets. The results are very encouraging, showing that the underlying structure in the data can be more accurately detected with the new clustering index. Our gene expression data results also indicate that this new index is stable under perturbation of the input data.Fil: Baya, Ariel Emilio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico - Conicet - Rosario. Instituto Rosario de Investigaciones En Ciencias de la Educación; ArgentinaFil: Granitto, Pablo Miguel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico - Conicet - Rosario. Instituto Rosario de Investigaciones En Ciencias de la Educación; ArgentinaIEEE Computer Society2013-04info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/1459Baya, Ariel Emilio; Granitto, Pablo Miguel; How Many Clusters: A Validation Index for Arbitrary-Shaped Clusters; IEEE Computer Society; Ieee-acm Transactions On Computational Biology And Bioinformatics; 10; 2; 4-2013; 401-4141545-5963enginfo:eu-repo/semantics/altIdentifier/doi/10.1109/TCBB.2013.32info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-10-15T14:39:10Zoai:ri.conicet.gov.ar:11336/1459instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-10-15 14:39:10.484CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv How Many Clusters: A Validation Index for Arbitrary-Shaped Clusters
title How Many Clusters: A Validation Index for Arbitrary-Shaped Clusters
spellingShingle How Many Clusters: A Validation Index for Arbitrary-Shaped Clusters
Baya, Ariel Emilio
CLUSTERING
GENOMIC DATA
VALIDATION INDEX
title_short How Many Clusters: A Validation Index for Arbitrary-Shaped Clusters
title_full How Many Clusters: A Validation Index for Arbitrary-Shaped Clusters
title_fullStr How Many Clusters: A Validation Index for Arbitrary-Shaped Clusters
title_full_unstemmed How Many Clusters: A Validation Index for Arbitrary-Shaped Clusters
title_sort How Many Clusters: A Validation Index for Arbitrary-Shaped Clusters
dc.creator.none.fl_str_mv Baya, Ariel Emilio
Granitto, Pablo Miguel
author Baya, Ariel Emilio
author_facet Baya, Ariel Emilio
Granitto, Pablo Miguel
author_role author
author2 Granitto, Pablo Miguel
author2_role author
dc.subject.none.fl_str_mv CLUSTERING
GENOMIC DATA
VALIDATION INDEX
topic CLUSTERING
GENOMIC DATA
VALIDATION INDEX
purl_subject.fl_str_mv https://purl.org/becyt/ford/2.2
https://purl.org/becyt/ford/2
dc.description.none.fl_txt_mv Clustering validation indexes are intended to assess the goodness of clustering results. Many methods used to estimate the number of clusters rely on a validation index as a key element to find the correct answer. This paper presents a new validation index based on graph concepts, which has been designed to find arbitrary shaped clusters by exploiting the spatial layout of the patterns and their clustering label. This new clustering index is combined with a solid statistical detection framework, the Gap Statistic. The resulting method is able to find the right number of arbitrary shaped clusters in diverse situations, as we show with examples where this information is available. A comparison with several relevant validation methods is carried out using artificial and gene expression datasets. The results are very encouraging, showing that the underlying structure in the data can be more accurately detected with the new clustering index. Our gene expression data results also indicate that this new index is stable under perturbation of the input data.
Fil: Baya, Ariel Emilio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico - Conicet - Rosario. Instituto Rosario de Investigaciones En Ciencias de la Educación; Argentina
Fil: Granitto, Pablo Miguel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico - Conicet - Rosario. Instituto Rosario de Investigaciones En Ciencias de la Educación; Argentina
description Clustering validation indexes are intended to assess the goodness of clustering results. Many methods used to estimate the number of clusters rely on a validation index as a key element to find the correct answer. This paper presents a new validation index based on graph concepts, which has been designed to find arbitrary shaped clusters by exploiting the spatial layout of the patterns and their clustering label. This new clustering index is combined with a solid statistical detection framework, the Gap Statistic. The resulting method is able to find the right number of arbitrary shaped clusters in diverse situations, as we show with examples where this information is available. A comparison with several relevant validation methods is carried out using artificial and gene expression datasets. The results are very encouraging, showing that the underlying structure in the data can be more accurately detected with the new clustering index. Our gene expression data results also indicate that this new index is stable under perturbation of the input data.
publishDate 2013
dc.date.none.fl_str_mv 2013-04
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/1459
Baya, Ariel Emilio; Granitto, Pablo Miguel; How Many Clusters: A Validation Index for Arbitrary-Shaped Clusters; IEEE Computer Society; Ieee-acm Transactions On Computational Biology And Bioinformatics; 10; 2; 4-2013; 401-414
1545-5963
url http://hdl.handle.net/11336/1459
identifier_str_mv Baya, Ariel Emilio; Granitto, Pablo Miguel; How Many Clusters: A Validation Index for Arbitrary-Shaped Clusters; IEEE Computer Society; Ieee-acm Transactions On Computational Biology And Bioinformatics; 10; 2; 4-2013; 401-414
1545-5963
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/10.1109/TCBB.2013.32
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv IEEE Computer Society
publisher.none.fl_str_mv IEEE Computer Society
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1846082875010580480
score 13.22299