A new AntTree-based algorithm for clustering short-text corpora

Autores
Errecalde, Marcelo Luis; Ingaramo, Diego Alejandro; Rosso, Paolo
Año de publicación
2010
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Research work on "short-text clustering" is a very important research area due to the current tendency for people to use "small-language", e.g. blogs, textmessaging and others. In some recent works, new bioinspired clustering algorithms have been proposed to deal with this difficult problem and novel uses of Internal Clustering Validity Measures have also been presented. In this work, a new AntTree-based approach is proposed for this task. It integrates information on the Silhouette Coefficient and the concept of attraction of a cluster in different stages of the clustering process. The proposal achieves results comparable to the best reported results in this area, showing an interesting stability in the quality of the results and presenting some interesting capabilities as a general improvement method for arbitrary clustering approaches.
Facultad de Informática
Materia
Ciencias Informáticas
short-text clustering
bio-inspired algorithms
internal validity measures
silhouette coefficient
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc/3.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/9660

id SEDICI_93b1d644f6344a01f83c9d2d1403b0d5
oai_identifier_str oai:sedici.unlp.edu.ar:10915/9660
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling A new AntTree-based algorithm for clustering short-text corporaErrecalde, Marcelo LuisIngaramo, Diego AlejandroRosso, PaoloCiencias Informáticasshort-text clusteringbio-inspired algorithmsinternal validity measuressilhouette coefficientResearch work on "short-text clustering" is a very important research area due to the current tendency for people to use "small-language", e.g. blogs, textmessaging and others. In some recent works, new bioinspired clustering algorithms have been proposed to deal with this difficult problem and novel uses of Internal Clustering Validity Measures have also been presented. In this work, a new AntTree-based approach is proposed for this task. It integrates information on the Silhouette Coefficient and the concept of attraction of a cluster in different stages of the clustering process. The proposal achieves results comparable to the best reported results in this area, showing an interesting stability in the quality of the results and presenting some interesting capabilities as a general improvement method for arbitrary clustering approaches.Facultad de Informática2010-04info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionArticulohttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdf1-7http://sedici.unlp.edu.ar/handle/10915/9660enginfo:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/wp-content/uploads/JCST-Apr10-1.pdfinfo:eu-repo/semantics/altIdentifier/issn/1666-6038info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc/3.0/Creative Commons Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T10:50:44Zoai:sedici.unlp.edu.ar:10915/9660Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 10:50:45.24SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv A new AntTree-based algorithm for clustering short-text corpora
title A new AntTree-based algorithm for clustering short-text corpora
spellingShingle A new AntTree-based algorithm for clustering short-text corpora
Errecalde, Marcelo Luis
Ciencias Informáticas
short-text clustering
bio-inspired algorithms
internal validity measures
silhouette coefficient
title_short A new AntTree-based algorithm for clustering short-text corpora
title_full A new AntTree-based algorithm for clustering short-text corpora
title_fullStr A new AntTree-based algorithm for clustering short-text corpora
title_full_unstemmed A new AntTree-based algorithm for clustering short-text corpora
title_sort A new AntTree-based algorithm for clustering short-text corpora
dc.creator.none.fl_str_mv Errecalde, Marcelo Luis
Ingaramo, Diego Alejandro
Rosso, Paolo
author Errecalde, Marcelo Luis
author_facet Errecalde, Marcelo Luis
Ingaramo, Diego Alejandro
Rosso, Paolo
author_role author
author2 Ingaramo, Diego Alejandro
Rosso, Paolo
author2_role author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
short-text clustering
bio-inspired algorithms
internal validity measures
silhouette coefficient
topic Ciencias Informáticas
short-text clustering
bio-inspired algorithms
internal validity measures
silhouette coefficient
dc.description.none.fl_txt_mv Research work on "short-text clustering" is a very important research area due to the current tendency for people to use "small-language", e.g. blogs, textmessaging and others. In some recent works, new bioinspired clustering algorithms have been proposed to deal with this difficult problem and novel uses of Internal Clustering Validity Measures have also been presented. In this work, a new AntTree-based approach is proposed for this task. It integrates information on the Silhouette Coefficient and the concept of attraction of a cluster in different stages of the clustering process. The proposal achieves results comparable to the best reported results in this area, showing an interesting stability in the quality of the results and presenting some interesting capabilities as a general improvement method for arbitrary clustering approaches.
Facultad de Informática
description Research work on "short-text clustering" is a very important research area due to the current tendency for people to use "small-language", e.g. blogs, textmessaging and others. In some recent works, new bioinspired clustering algorithms have been proposed to deal with this difficult problem and novel uses of Internal Clustering Validity Measures have also been presented. In this work, a new AntTree-based approach is proposed for this task. It integrates information on the Silhouette Coefficient and the concept of attraction of a cluster in different stages of the clustering process. The proposal achieves results comparable to the best reported results in this area, showing an interesting stability in the quality of the results and presenting some interesting capabilities as a general improvement method for arbitrary clustering approaches.
publishDate 2010
dc.date.none.fl_str_mv 2010-04
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
Articulo
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/9660
url http://sedici.unlp.edu.ar/handle/10915/9660
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/wp-content/uploads/JCST-Apr10-1.pdf
info:eu-repo/semantics/altIdentifier/issn/1666-6038
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc/3.0/
Creative Commons Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc/3.0/
Creative Commons Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0)
dc.format.none.fl_str_mv application/pdf
1-7
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1844615758804418560
score 13.070432