Silhouette + Attraction: A Simple and Effective Method for Text Clustering

Autores
Errecalde, Marcelo L.; Cagnina, Leticia Cecilia; Rosso, Paolo
Año de publicación
2015
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
This article presents Sil-Att, a simple and effective method for text clustering, which is based on two main concepts: the silhouette coefficient and the idea of attraction. The combination of both principles allows to obtain a general technique that can be used either as a boosting method, which improves results of other clustering algorithms, or as an independent clustering algorithm. The experimental work shows that Sil-Att is able to obtain high quality results on text corpora with very different characteristics. Furthermore, its stable performance on all the considered corpora is indicative that it is a very robust method. This is a very interesting positive aspect of Sil-Att with respect to the other algorithms used in the experiments, whose performances heavily depend on specific characteristics of the corpora being considered.
Fil: Errecalde, Marcelo L.. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Departamento de Informática. Laboratorio Investigación y Desarrollo En Inteligencia Computacional; Argentina
Fil: Cagnina, Leticia Cecilia. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Departamento de Informática. Laboratorio Investigación y Desarrollo En Inteligencia Computacional; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Rosso, Paolo. Universidad Politecnica de Valencia; España
Materia
Clustering
Short Texts Corpora
Attraction
Silhouette
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/7135

id CONICETDig_a0eafe318e475318ec978473581364fb
oai_identifier_str oai:ri.conicet.gov.ar:11336/7135
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Silhouette + Attraction: A Simple and Effective Method for Text ClusteringErrecalde, Marcelo L.Cagnina, Leticia CeciliaRosso, PaoloClusteringShort Texts CorporaAttractionSilhouettehttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1This article presents Sil-Att, a simple and effective method for text clustering, which is based on two main concepts: the silhouette coefficient and the idea of attraction. The combination of both principles allows to obtain a general technique that can be used either as a boosting method, which improves results of other clustering algorithms, or as an independent clustering algorithm. The experimental work shows that Sil-Att is able to obtain high quality results on text corpora with very different characteristics. Furthermore, its stable performance on all the considered corpora is indicative that it is a very robust method. This is a very interesting positive aspect of Sil-Att with respect to the other algorithms used in the experiments, whose performances heavily depend on specific characteristics of the corpora being considered.Fil: Errecalde, Marcelo L.. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Departamento de Informática. Laboratorio Investigación y Desarrollo En Inteligencia Computacional; ArgentinaFil: Cagnina, Leticia Cecilia. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Departamento de Informática. Laboratorio Investigación y Desarrollo En Inteligencia Computacional; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Rosso, Paolo. Universidad Politecnica de Valencia; EspañaCambridge University Press2015-08-14info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/zipapplication/pdfhttp://hdl.handle.net/11336/7135Errecalde, Marcelo L.; Cagnina, Leticia Cecilia; Rosso, Paolo; Silhouette + Attraction: A Simple and Effective Method for Text Clustering; Cambridge University Press; Natural Language Engineering; 1; 14-8-2015; 1-401351-3249enginfo:eu-repo/semantics/altIdentifier/doi/10.1017/S1351324915000273info:eu-repo/semantics/altIdentifier/url/http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=9910907info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-10-15T15:38:41Zoai:ri.conicet.gov.ar:11336/7135instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-10-15 15:38:42.168CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Silhouette + Attraction: A Simple and Effective Method for Text Clustering
title Silhouette + Attraction: A Simple and Effective Method for Text Clustering
spellingShingle Silhouette + Attraction: A Simple and Effective Method for Text Clustering
Errecalde, Marcelo L.
Clustering
Short Texts Corpora
Attraction
Silhouette
title_short Silhouette + Attraction: A Simple and Effective Method for Text Clustering
title_full Silhouette + Attraction: A Simple and Effective Method for Text Clustering
title_fullStr Silhouette + Attraction: A Simple and Effective Method for Text Clustering
title_full_unstemmed Silhouette + Attraction: A Simple and Effective Method for Text Clustering
title_sort Silhouette + Attraction: A Simple and Effective Method for Text Clustering
dc.creator.none.fl_str_mv Errecalde, Marcelo L.
Cagnina, Leticia Cecilia
Rosso, Paolo
author Errecalde, Marcelo L.
author_facet Errecalde, Marcelo L.
Cagnina, Leticia Cecilia
Rosso, Paolo
author_role author
author2 Cagnina, Leticia Cecilia
Rosso, Paolo
author2_role author
author
dc.subject.none.fl_str_mv Clustering
Short Texts Corpora
Attraction
Silhouette
topic Clustering
Short Texts Corpora
Attraction
Silhouette
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv This article presents Sil-Att, a simple and effective method for text clustering, which is based on two main concepts: the silhouette coefficient and the idea of attraction. The combination of both principles allows to obtain a general technique that can be used either as a boosting method, which improves results of other clustering algorithms, or as an independent clustering algorithm. The experimental work shows that Sil-Att is able to obtain high quality results on text corpora with very different characteristics. Furthermore, its stable performance on all the considered corpora is indicative that it is a very robust method. This is a very interesting positive aspect of Sil-Att with respect to the other algorithms used in the experiments, whose performances heavily depend on specific characteristics of the corpora being considered.
Fil: Errecalde, Marcelo L.. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Departamento de Informática. Laboratorio Investigación y Desarrollo En Inteligencia Computacional; Argentina
Fil: Cagnina, Leticia Cecilia. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Departamento de Informática. Laboratorio Investigación y Desarrollo En Inteligencia Computacional; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Rosso, Paolo. Universidad Politecnica de Valencia; España
description This article presents Sil-Att, a simple and effective method for text clustering, which is based on two main concepts: the silhouette coefficient and the idea of attraction. The combination of both principles allows to obtain a general technique that can be used either as a boosting method, which improves results of other clustering algorithms, or as an independent clustering algorithm. The experimental work shows that Sil-Att is able to obtain high quality results on text corpora with very different characteristics. Furthermore, its stable performance on all the considered corpora is indicative that it is a very robust method. This is a very interesting positive aspect of Sil-Att with respect to the other algorithms used in the experiments, whose performances heavily depend on specific characteristics of the corpora being considered.
publishDate 2015
dc.date.none.fl_str_mv 2015-08-14
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/7135
Errecalde, Marcelo L.; Cagnina, Leticia Cecilia; Rosso, Paolo; Silhouette + Attraction: A Simple and Effective Method for Text Clustering; Cambridge University Press; Natural Language Engineering; 1; 14-8-2015; 1-40
1351-3249
url http://hdl.handle.net/11336/7135
identifier_str_mv Errecalde, Marcelo L.; Cagnina, Leticia Cecilia; Rosso, Paolo; Silhouette + Attraction: A Simple and Effective Method for Text Clustering; Cambridge University Press; Natural Language Engineering; 1; 14-8-2015; 1-40
1351-3249
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/10.1017/S1351324915000273
info:eu-repo/semantics/altIdentifier/url/http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=9910907
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/zip
application/pdf
dc.publisher.none.fl_str_mv Cambridge University Press
publisher.none.fl_str_mv Cambridge University Press
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1846083504887037952
score 12.891075