A PSO-based clustering approach assisted by initial clustering information

Autores
Velázquez, Carlos; Cagnina, Leticia; Errecalde, Marcelo Luis
Año de publicación
2012
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Clustering of short texts is an important research area because of its applicability in information retrieval and text mining. To this end was proposed CLUDIPSO, a discrete Particle Swarm Optimization algorithm to cluster short texts. Initial results showed that CLUDIPSO has performed well in small collections of short texts. However, later works showed some drawbacks when dealing with larger collections. In this paper we present a hybridization of CLUDIPSO to overcome these drawbacks, by providing information in the initial cycles of the algorithm to avoid a random search and thus speed up the convergence process. This is achieved by using a pre-clustering obtained with the Expectation-Maximization method which is included in the initial population of the algorithm. The results obtained with the hybrid version show a significant improvement over those obtained with the original version.
Eje: Workshop Bases de datos y minería de datos (WBDDM)
Red de Universidades con Carreras en Informática (RedUNCI)
Materia
Ciencias Informáticas
Short-Text Clustering
Bio-Inspired Methods
PSO-based Clustering
Hybrid Methods
Expectation-Maximization
Initialization Approaches
Clustering
base de datos
Data mining
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/23753

id SEDICI_9ff018b85fa42be4a6c03d547a9f9461
oai_identifier_str oai:sedici.unlp.edu.ar:10915/23753
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling A PSO-based clustering approach assisted by initial clustering informationVelázquez, CarlosCagnina, LeticiaErrecalde, Marcelo LuisCiencias InformáticasShort-Text ClusteringBio-Inspired MethodsPSO-based ClusteringHybrid MethodsExpectation-MaximizationInitialization ApproachesClusteringbase de datosData miningClustering of short texts is an important research area because of its applicability in information retrieval and text mining. To this end was proposed CLUDIPSO, a discrete Particle Swarm Optimization algorithm to cluster short texts. Initial results showed that CLUDIPSO has performed well in small collections of short texts. However, later works showed some drawbacks when dealing with larger collections. In this paper we present a hybridization of CLUDIPSO to overcome these drawbacks, by providing information in the initial cycles of the algorithm to avoid a random search and thus speed up the convergence process. This is achieved by using a pre-clustering obtained with the Expectation-Maximization method which is included in the initial population of the algorithm. The results obtained with the hybrid version show a significant improvement over those obtained with the original version.Eje: Workshop Bases de datos y minería de datos (WBDDM)Red de Universidades con Carreras en Informática (RedUNCI)2012-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/23753enginfo:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/2.5/ar/Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T10:55:35Zoai:sedici.unlp.edu.ar:10915/23753Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 10:55:35.82SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv A PSO-based clustering approach assisted by initial clustering information
title A PSO-based clustering approach assisted by initial clustering information
spellingShingle A PSO-based clustering approach assisted by initial clustering information
Velázquez, Carlos
Ciencias Informáticas
Short-Text Clustering
Bio-Inspired Methods
PSO-based Clustering
Hybrid Methods
Expectation-Maximization
Initialization Approaches
Clustering
base de datos
Data mining
title_short A PSO-based clustering approach assisted by initial clustering information
title_full A PSO-based clustering approach assisted by initial clustering information
title_fullStr A PSO-based clustering approach assisted by initial clustering information
title_full_unstemmed A PSO-based clustering approach assisted by initial clustering information
title_sort A PSO-based clustering approach assisted by initial clustering information
dc.creator.none.fl_str_mv Velázquez, Carlos
Cagnina, Leticia
Errecalde, Marcelo Luis
author Velázquez, Carlos
author_facet Velázquez, Carlos
Cagnina, Leticia
Errecalde, Marcelo Luis
author_role author
author2 Cagnina, Leticia
Errecalde, Marcelo Luis
author2_role author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
Short-Text Clustering
Bio-Inspired Methods
PSO-based Clustering
Hybrid Methods
Expectation-Maximization
Initialization Approaches
Clustering
base de datos
Data mining
topic Ciencias Informáticas
Short-Text Clustering
Bio-Inspired Methods
PSO-based Clustering
Hybrid Methods
Expectation-Maximization
Initialization Approaches
Clustering
base de datos
Data mining
dc.description.none.fl_txt_mv Clustering of short texts is an important research area because of its applicability in information retrieval and text mining. To this end was proposed CLUDIPSO, a discrete Particle Swarm Optimization algorithm to cluster short texts. Initial results showed that CLUDIPSO has performed well in small collections of short texts. However, later works showed some drawbacks when dealing with larger collections. In this paper we present a hybridization of CLUDIPSO to overcome these drawbacks, by providing information in the initial cycles of the algorithm to avoid a random search and thus speed up the convergence process. This is achieved by using a pre-clustering obtained with the Expectation-Maximization method which is included in the initial population of the algorithm. The results obtained with the hybrid version show a significant improvement over those obtained with the original version.
Eje: Workshop Bases de datos y minería de datos (WBDDM)
Red de Universidades con Carreras en Informática (RedUNCI)
description Clustering of short texts is an important research area because of its applicability in information retrieval and text mining. To this end was proposed CLUDIPSO, a discrete Particle Swarm Optimization algorithm to cluster short texts. Initial results showed that CLUDIPSO has performed well in small collections of short texts. However, later works showed some drawbacks when dealing with larger collections. In this paper we present a hybridization of CLUDIPSO to overcome these drawbacks, by providing information in the initial cycles of the algorithm to avoid a random search and thus speed up the convergence process. This is achieved by using a pre-clustering obtained with the Expectation-Maximization method which is included in the initial population of the algorithm. The results obtained with the hybrid version show a significant improvement over those obtained with the original version.
publishDate 2012
dc.date.none.fl_str_mv 2012-10
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/23753
url http://sedici.unlp.edu.ar/handle/10915/23753
dc.language.none.fl_str_mv eng
language eng
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1844615815287013376
score 13.070432