A new AntTree-based algorithm for clustering short-text corpora
- Autores
- Errecalde, Marcelo Luis; Ingaramo, Diego Alejandro; Rosso, Paolo
- Año de publicación
- 2010
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Research work on "short-text clustering" is a very important research area due to the current tendency for people to use "small-language", e.g. blogs, textmessaging and others. In some recent works, new bioinspired clustering algorithms have been proposed to deal with this difficult problem and novel uses of Internal Clustering Validity Measures have also been presented. In this work, a new AntTree-based approach is proposed for this task. It integrates information on the Silhouette Coefficient and the concept of attraction of a cluster in different stages of the clustering process. The proposal achieves results comparable to the best reported results in this area, showing an interesting stability in the quality of the results and presenting some interesting capabilities as a general improvement method for arbitrary clustering approaches.
Facultad de Informática - Materia
-
Ciencias Informáticas
short-text clustering
bio-inspired algorithms
internal validity measures
silhouette coefficient - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-nc/3.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/9660
Ver los metadatos del registro completo
id |
SEDICI_93b1d644f6344a01f83c9d2d1403b0d5 |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/9660 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
A new AntTree-based algorithm for clustering short-text corporaErrecalde, Marcelo LuisIngaramo, Diego AlejandroRosso, PaoloCiencias Informáticasshort-text clusteringbio-inspired algorithmsinternal validity measuressilhouette coefficientResearch work on "short-text clustering" is a very important research area due to the current tendency for people to use "small-language", e.g. blogs, textmessaging and others. In some recent works, new bioinspired clustering algorithms have been proposed to deal with this difficult problem and novel uses of Internal Clustering Validity Measures have also been presented. In this work, a new AntTree-based approach is proposed for this task. It integrates information on the Silhouette Coefficient and the concept of attraction of a cluster in different stages of the clustering process. The proposal achieves results comparable to the best reported results in this area, showing an interesting stability in the quality of the results and presenting some interesting capabilities as a general improvement method for arbitrary clustering approaches.Facultad de Informática2010-04info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionArticulohttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdf1-7http://sedici.unlp.edu.ar/handle/10915/9660enginfo:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/wp-content/uploads/JCST-Apr10-1.pdfinfo:eu-repo/semantics/altIdentifier/issn/1666-6038info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc/3.0/Creative Commons Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T10:50:44Zoai:sedici.unlp.edu.ar:10915/9660Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 10:50:45.24SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
A new AntTree-based algorithm for clustering short-text corpora |
title |
A new AntTree-based algorithm for clustering short-text corpora |
spellingShingle |
A new AntTree-based algorithm for clustering short-text corpora Errecalde, Marcelo Luis Ciencias Informáticas short-text clustering bio-inspired algorithms internal validity measures silhouette coefficient |
title_short |
A new AntTree-based algorithm for clustering short-text corpora |
title_full |
A new AntTree-based algorithm for clustering short-text corpora |
title_fullStr |
A new AntTree-based algorithm for clustering short-text corpora |
title_full_unstemmed |
A new AntTree-based algorithm for clustering short-text corpora |
title_sort |
A new AntTree-based algorithm for clustering short-text corpora |
dc.creator.none.fl_str_mv |
Errecalde, Marcelo Luis Ingaramo, Diego Alejandro Rosso, Paolo |
author |
Errecalde, Marcelo Luis |
author_facet |
Errecalde, Marcelo Luis Ingaramo, Diego Alejandro Rosso, Paolo |
author_role |
author |
author2 |
Ingaramo, Diego Alejandro Rosso, Paolo |
author2_role |
author author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas short-text clustering bio-inspired algorithms internal validity measures silhouette coefficient |
topic |
Ciencias Informáticas short-text clustering bio-inspired algorithms internal validity measures silhouette coefficient |
dc.description.none.fl_txt_mv |
Research work on "short-text clustering" is a very important research area due to the current tendency for people to use "small-language", e.g. blogs, textmessaging and others. In some recent works, new bioinspired clustering algorithms have been proposed to deal with this difficult problem and novel uses of Internal Clustering Validity Measures have also been presented. In this work, a new AntTree-based approach is proposed for this task. It integrates information on the Silhouette Coefficient and the concept of attraction of a cluster in different stages of the clustering process. The proposal achieves results comparable to the best reported results in this area, showing an interesting stability in the quality of the results and presenting some interesting capabilities as a general improvement method for arbitrary clustering approaches. Facultad de Informática |
description |
Research work on "short-text clustering" is a very important research area due to the current tendency for people to use "small-language", e.g. blogs, textmessaging and others. In some recent works, new bioinspired clustering algorithms have been proposed to deal with this difficult problem and novel uses of Internal Clustering Validity Measures have also been presented. In this work, a new AntTree-based approach is proposed for this task. It integrates information on the Silhouette Coefficient and the concept of attraction of a cluster in different stages of the clustering process. The proposal achieves results comparable to the best reported results in this area, showing an interesting stability in the quality of the results and presenting some interesting capabilities as a general improvement method for arbitrary clustering approaches. |
publishDate |
2010 |
dc.date.none.fl_str_mv |
2010-04 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion Articulo http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/9660 |
url |
http://sedici.unlp.edu.ar/handle/10915/9660 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/wp-content/uploads/JCST-Apr10-1.pdf info:eu-repo/semantics/altIdentifier/issn/1666-6038 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc/3.0/ Creative Commons Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc/3.0/ Creative Commons Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0) |
dc.format.none.fl_str_mv |
application/pdf 1-7 |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1844615758804418560 |
score |
13.070432 |