Cluster Ensembles for Big Data Mining Problems

Autores: Pividori, Milton; Stegmayer, Georgina; Milone, Diego H.
Año de publicación: 2015
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: Mining big data involves several problems and new challenges, in addition to the huge volume of information. One the one hand, these data generally come from autonomous and decentralized sources, thus its dimensionality is heterogeneous and diverse, and generally involves privacy issues. On the other hand, algorithms for mining data such as clustering methods, have particular characteristics that make them useful for different types of data mining problems. Due to the huge amount of information, the task of choosing a single clustering approach becomes even more difficult. For instance, k-means, a very popular algorithm, always assumes spherical clusters in data; hierarchical approaches can be used when there is interest in finding this type of structure; expectationmaximization iteratively adjusts the parameters of a statistical model to fit the observed data. Moreover, all these methods work properly only with relatively small data sets. Large-volume data often make their application unfeasible, not to mention if data come from autonomous sources that are constantly growing and evolving. In the last years, a new clustering approach has emerged, called consensus clustering or cluster ensembles. Instead of running a single algorithm, this approach produces, at first, a set of data partitions (ensemble) by employing different clustering techniques on the same original data set. Then, this ensemble is processed by a consensus function, which produces a single consensus partition that outperforms individual solutions in the input ensemble. This approach has been successfully employed for distributed data mining, what makes it very interesting and applicable in the big data context. Although many techniques have been proposed for large data sets, most of them mainly focus on making individual components more efficient, instead of improving the whole consensus approach for the case of big data.
Sociedad Argentina de Informática e Investigación Operativa (SADIO)
Materia: Ciencias Informáticas
Data mining
big data
Clustering
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by-sa/3.0/
Repositorio
Institución: Universidad Nacional de La Plata
OAI Identificador: oai:sedici.unlp.edu.ar:10915/51984

Acceder

id	SEDICI_fbd42a5f249304359f95497ca202788e
oai_identifier_str	oai:sedici.unlp.edu.ar:10915/51984
network_acronym_str	SEDICI
repository_id_str	1329
network_name_str	SEDICI (UNLP)
spelling	Cluster Ensembles for Big Data Mining ProblemsPividori, MiltonStegmayer, GeorginaMilone, Diego H.Ciencias InformáticasData miningbig dataClusteringMining big data involves several problems and new challenges, in addition to the huge volume of information. One the one hand, these data generally come from autonomous and decentralized sources, thus its dimensionality is heterogeneous and diverse, and generally involves privacy issues. On the other hand, algorithms for mining data such as clustering methods, have particular characteristics that make them useful for different types of data mining problems. Due to the huge amount of information, the task of choosing a single clustering approach becomes even more difficult. For instance, k-means, a very popular algorithm, always assumes spherical clusters in data; hierarchical approaches can be used when there is interest in finding this type of structure; expectationmaximization iteratively adjusts the parameters of a statistical model to fit the observed data. Moreover, all these methods work properly only with relatively small data sets. Large-volume data often make their application unfeasible, not to mention if data come from autonomous sources that are constantly growing and evolving. In the last years, a new clustering approach has emerged, called consensus clustering or cluster ensembles. Instead of running a single algorithm, this approach produces, at first, a set of data partitions (ensemble) by employing different clustering techniques on the same original data set. Then, this ensemble is processed by a consensus function, which produces a single consensus partition that outperforms individual solutions in the input ensemble. This approach has been successfully employed for distributed data mining, what makes it very interesting and applicable in the big data context. Although many techniques have been proposed for large data sets, most of them mainly focus on making individual components more efficient, instead of improving the whole consensus approach for the case of big data.Sociedad Argentina de Informática e Investigación Operativa (SADIO)2015-09info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf52-54http://sedici.unlp.edu.ar/handle/10915/51984enginfo:eu-repo/semantics/altIdentifier/url/http://44jaiio.sadio.org.ar/sites/default/files/agranda52-54.pdfinfo:eu-repo/semantics/altIdentifier/issn/2451-7569info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-sa/3.0/Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2026-05-06T12:14:13Zoai:sedici.unlp.edu.ar:10915/51984Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292026-05-06 12:14:13.367SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv	Cluster Ensembles for Big Data Mining Problems
title	Cluster Ensembles for Big Data Mining Problems
spellingShingle	Cluster Ensembles for Big Data Mining Problems Pividori, Milton Ciencias Informáticas Data mining big data Clustering
title_short	Cluster Ensembles for Big Data Mining Problems
title_full	Cluster Ensembles for Big Data Mining Problems
title_fullStr	Cluster Ensembles for Big Data Mining Problems
title_full_unstemmed	Cluster Ensembles for Big Data Mining Problems
title_sort	Cluster Ensembles for Big Data Mining Problems
dc.creator.none.fl_str_mv	Pividori, Milton Stegmayer, Georgina Milone, Diego H.
author	Pividori, Milton
author_facet	Pividori, Milton Stegmayer, Georgina Milone, Diego H.
author_role	author
author2	Stegmayer, Georgina Milone, Diego H.
author2_role	author author
dc.subject.none.fl_str_mv	Ciencias Informáticas Data mining big data Clustering
topic	Ciencias Informáticas Data mining big data Clustering
dc.description.none.fl_txt_mv	Mining big data involves several problems and new challenges, in addition to the huge volume of information. One the one hand, these data generally come from autonomous and decentralized sources, thus its dimensionality is heterogeneous and diverse, and generally involves privacy issues. On the other hand, algorithms for mining data such as clustering methods, have particular characteristics that make them useful for different types of data mining problems. Due to the huge amount of information, the task of choosing a single clustering approach becomes even more difficult. For instance, k-means, a very popular algorithm, always assumes spherical clusters in data; hierarchical approaches can be used when there is interest in finding this type of structure; expectationmaximization iteratively adjusts the parameters of a statistical model to fit the observed data. Moreover, all these methods work properly only with relatively small data sets. Large-volume data often make their application unfeasible, not to mention if data come from autonomous sources that are constantly growing and evolving. In the last years, a new clustering approach has emerged, called consensus clustering or cluster ensembles. Instead of running a single algorithm, this approach produces, at first, a set of data partitions (ensemble) by employing different clustering techniques on the same original data set. Then, this ensemble is processed by a consensus function, which produces a single consensus partition that outperforms individual solutions in the input ensemble. This approach has been successfully employed for distributed data mining, what makes it very interesting and applicable in the big data context. Although many techniques have been proposed for large data sets, most of them mainly focus on making individual components more efficient, instead of improving the whole consensus approach for the case of big data. Sociedad Argentina de Informática e Investigación Operativa (SADIO)
description	Mining big data involves several problems and new challenges, in addition to the huge volume of information. One the one hand, these data generally come from autonomous and decentralized sources, thus its dimensionality is heterogeneous and diverse, and generally involves privacy issues. On the other hand, algorithms for mining data such as clustering methods, have particular characteristics that make them useful for different types of data mining problems. Due to the huge amount of information, the task of choosing a single clustering approach becomes even more difficult. For instance, k-means, a very popular algorithm, always assumes spherical clusters in data; hierarchical approaches can be used when there is interest in finding this type of structure; expectationmaximization iteratively adjusts the parameters of a statistical model to fit the observed data. Moreover, all these methods work properly only with relatively small data sets. Large-volume data often make their application unfeasible, not to mention if data come from autonomous sources that are constantly growing and evolving. In the last years, a new clustering approach has emerged, called consensus clustering or cluster ensembles. Instead of running a single algorithm, this approach produces, at first, a set of data partitions (ensemble) by employing different clustering techniques on the same original data set. Then, this ensemble is processed by a consensus function, which produces a single consensus partition that outperforms individual solutions in the input ensemble. This approach has been successfully employed for distributed data mining, what makes it very interesting and applicable in the big data context. Although many techniques have been proposed for large data sets, most of them mainly focus on making individual components more efficient, instead of improving the whole consensus approach for the case of big data.
publishDate	2015
dc.date.none.fl_str_mv	2015-09
dc.type.none.fl_str_mv	info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
format	conferenceObject
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://sedici.unlp.edu.ar/handle/10915/51984
url	http://sedici.unlp.edu.ar/handle/10915/51984
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/http://44jaiio.sadio.org.ar/sites/default/files/agranda52-54.pdf info:eu-repo/semantics/altIdentifier/issn/2451-7569
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
dc.format.none.fl_str_mv	application/pdf 52-54
dc.source.none.fl_str_mv	reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP
reponame_str	SEDICI (UNLP)
collection	SEDICI (UNLP)
instname_str	Universidad Nacional de La Plata
instacron_str	UNLP
institution	UNLP
repository.name.fl_str_mv	SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv	alira@sedici.unlp.edu.ar
_version_	1864468293144805376
score	13.1485815

Cluster Ensembles for Big Data Mining Problems

Publicaciones similares