Taxonomic evidence applying algorithms of intelligent data mining : Asteroids families

Autores
Perichinsky, Gregorio; Servente, Magdalena; Servetto, Arturo Carlos; García Martínez, Ramón; Orellana, Rosa Beatriz; Plastino, Ángel Luis
Año de publicación
2003
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Numerical Taxonomy aims to group in clusters, using so-called structure analysis of operational taxonomic units (OTUs or taxons or taxa) through numerical methods. Clusters that consitute families was the purpose of this series of last projects. Structural analysis, based on their phenotypic characteristics, exhibits the relationships, in terms of degrees of similarity, between two or more OTUs. Entities formed by dynamic domains of attributes, change according to taxonomical requirements: Classification of objects to form families. Taxonomic objects are represented by semantics application of Dynamic Relational Database Model. Families of OTUs are obtained employing as tools i) the Euclidean distance and ii) nearest neighbor techniques. Thus taxonomic evidence is gathered so as to quantify the similarity for each pair of OTUs (pair-group method) obtained from the basic data matrix. The main contribution up until now is to introduce the concept of spectrum of the OTUs, based in the states of their characters. The concept of families’ spectra emerges, if the superposition principle is applied to the spectra of the OTUs, and the groups are delimited through the maximum of the Bienaymé-Tchebycheff relation, that determines Invariants (centroid, variance and radius). A new taxonomic criterion is thereby formulated. An astronomic application is worked out. The result is a new criterion for the classification of asteroids in the hyperspace of orbital proper elements. Thus, a new approach to Computational Taxonomy is presented, that has been already employed with reference to Data Mining. This paper analyses the application of Machine Learning techniques to Data Mining. We focused our interest on the TDIDT (Top Down Induction Trees) induction family from pre-classified data, and in particular to the ID3 and the C4.5 algorithms, created by Quinlan. We tried to determine the degree of efficiency achieved by the TDIDT family’s algorithms when applied in data mining to generate valid models of the data in classification problems with the Gain of Entropy. The Informatics (Data Mining and Computational Taxonomy), is always the original objective of our researches.
Eje: Bases de datos
Red de Universidades con Carreras en Informática (RedUNCI)
Materia
Ciencias Informáticas
classification
cluster (family)
spectrum
induction
divide and rule
entropy
base de datos
Algorithms
Data mining
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/21405

id SEDICI_6edfc4ba96eb7f2d7c86f26e46a689b2
oai_identifier_str oai:sedici.unlp.edu.ar:10915/21405
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Taxonomic evidence applying algorithms of intelligent data mining : Asteroids familiesPerichinsky, GregorioServente, MagdalenaServetto, Arturo CarlosGarcía Martínez, RamónOrellana, Rosa BeatrizPlastino, Ángel LuisCiencias Informáticasclassificationcluster (family)spectruminductiondivide and ruleentropybase de datosAlgorithmsData miningNumerical Taxonomy aims to group in clusters, using so-called structure analysis of operational taxonomic units (OTUs or taxons or taxa) through numerical methods. Clusters that consitute families was the purpose of this series of last projects. Structural analysis, based on their phenotypic characteristics, exhibits the relationships, in terms of degrees of similarity, between two or more OTUs. Entities formed by dynamic domains of attributes, change according to taxonomical requirements: Classification of objects to form families. Taxonomic objects are represented by semantics application of Dynamic Relational Database Model. Families of OTUs are obtained employing as tools i) the Euclidean distance and ii) nearest neighbor techniques. Thus taxonomic evidence is gathered so as to quantify the similarity for each pair of OTUs (pair-group method) obtained from the basic data matrix. The main contribution up until now is to introduce the concept of spectrum of the OTUs, based in the states of their characters. The concept of families’ spectra emerges, if the superposition principle is applied to the spectra of the OTUs, and the groups are delimited through the maximum of the Bienaymé-Tchebycheff relation, that determines Invariants (centroid, variance and radius). A new taxonomic criterion is thereby formulated. An astronomic application is worked out. The result is a new criterion for the classification of asteroids in the hyperspace of orbital proper elements. Thus, a new approach to Computational Taxonomy is presented, that has been already employed with reference to Data Mining. This paper analyses the application of Machine Learning techniques to Data Mining. We focused our interest on the TDIDT (Top Down Induction Trees) induction family from pre-classified data, and in particular to the ID3 and the C4.5 algorithms, created by Quinlan. We tried to determine the degree of efficiency achieved by the TDIDT family’s algorithms when applied in data mining to generate valid models of the data in classification problems with the Gain of Entropy. The Informatics (Data Mining and Computational Taxonomy), is always the original objective of our researches.Eje: Bases de datosRed de Universidades con Carreras en Informática (RedUNCI)2003-05info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf322-328http://sedici.unlp.edu.ar/handle/10915/21405enginfo:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/2.5/ar/Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T10:54:38Zoai:sedici.unlp.edu.ar:10915/21405Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 10:54:38.234SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Taxonomic evidence applying algorithms of intelligent data mining : Asteroids families
title Taxonomic evidence applying algorithms of intelligent data mining : Asteroids families
spellingShingle Taxonomic evidence applying algorithms of intelligent data mining : Asteroids families
Perichinsky, Gregorio
Ciencias Informáticas
classification
cluster (family)
spectrum
induction
divide and rule
entropy
base de datos
Algorithms
Data mining
title_short Taxonomic evidence applying algorithms of intelligent data mining : Asteroids families
title_full Taxonomic evidence applying algorithms of intelligent data mining : Asteroids families
title_fullStr Taxonomic evidence applying algorithms of intelligent data mining : Asteroids families
title_full_unstemmed Taxonomic evidence applying algorithms of intelligent data mining : Asteroids families
title_sort Taxonomic evidence applying algorithms of intelligent data mining : Asteroids families
dc.creator.none.fl_str_mv Perichinsky, Gregorio
Servente, Magdalena
Servetto, Arturo Carlos
García Martínez, Ramón
Orellana, Rosa Beatriz
Plastino, Ángel Luis
author Perichinsky, Gregorio
author_facet Perichinsky, Gregorio
Servente, Magdalena
Servetto, Arturo Carlos
García Martínez, Ramón
Orellana, Rosa Beatriz
Plastino, Ángel Luis
author_role author
author2 Servente, Magdalena
Servetto, Arturo Carlos
García Martínez, Ramón
Orellana, Rosa Beatriz
Plastino, Ángel Luis
author2_role author
author
author
author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
classification
cluster (family)
spectrum
induction
divide and rule
entropy
base de datos
Algorithms
Data mining
topic Ciencias Informáticas
classification
cluster (family)
spectrum
induction
divide and rule
entropy
base de datos
Algorithms
Data mining
dc.description.none.fl_txt_mv Numerical Taxonomy aims to group in clusters, using so-called structure analysis of operational taxonomic units (OTUs or taxons or taxa) through numerical methods. Clusters that consitute families was the purpose of this series of last projects. Structural analysis, based on their phenotypic characteristics, exhibits the relationships, in terms of degrees of similarity, between two or more OTUs. Entities formed by dynamic domains of attributes, change according to taxonomical requirements: Classification of objects to form families. Taxonomic objects are represented by semantics application of Dynamic Relational Database Model. Families of OTUs are obtained employing as tools i) the Euclidean distance and ii) nearest neighbor techniques. Thus taxonomic evidence is gathered so as to quantify the similarity for each pair of OTUs (pair-group method) obtained from the basic data matrix. The main contribution up until now is to introduce the concept of spectrum of the OTUs, based in the states of their characters. The concept of families’ spectra emerges, if the superposition principle is applied to the spectra of the OTUs, and the groups are delimited through the maximum of the Bienaymé-Tchebycheff relation, that determines Invariants (centroid, variance and radius). A new taxonomic criterion is thereby formulated. An astronomic application is worked out. The result is a new criterion for the classification of asteroids in the hyperspace of orbital proper elements. Thus, a new approach to Computational Taxonomy is presented, that has been already employed with reference to Data Mining. This paper analyses the application of Machine Learning techniques to Data Mining. We focused our interest on the TDIDT (Top Down Induction Trees) induction family from pre-classified data, and in particular to the ID3 and the C4.5 algorithms, created by Quinlan. We tried to determine the degree of efficiency achieved by the TDIDT family’s algorithms when applied in data mining to generate valid models of the data in classification problems with the Gain of Entropy. The Informatics (Data Mining and Computational Taxonomy), is always the original objective of our researches.
Eje: Bases de datos
Red de Universidades con Carreras en Informática (RedUNCI)
description Numerical Taxonomy aims to group in clusters, using so-called structure analysis of operational taxonomic units (OTUs or taxons or taxa) through numerical methods. Clusters that consitute families was the purpose of this series of last projects. Structural analysis, based on their phenotypic characteristics, exhibits the relationships, in terms of degrees of similarity, between two or more OTUs. Entities formed by dynamic domains of attributes, change according to taxonomical requirements: Classification of objects to form families. Taxonomic objects are represented by semantics application of Dynamic Relational Database Model. Families of OTUs are obtained employing as tools i) the Euclidean distance and ii) nearest neighbor techniques. Thus taxonomic evidence is gathered so as to quantify the similarity for each pair of OTUs (pair-group method) obtained from the basic data matrix. The main contribution up until now is to introduce the concept of spectrum of the OTUs, based in the states of their characters. The concept of families’ spectra emerges, if the superposition principle is applied to the spectra of the OTUs, and the groups are delimited through the maximum of the Bienaymé-Tchebycheff relation, that determines Invariants (centroid, variance and radius). A new taxonomic criterion is thereby formulated. An astronomic application is worked out. The result is a new criterion for the classification of asteroids in the hyperspace of orbital proper elements. Thus, a new approach to Computational Taxonomy is presented, that has been already employed with reference to Data Mining. This paper analyses the application of Machine Learning techniques to Data Mining. We focused our interest on the TDIDT (Top Down Induction Trees) induction family from pre-classified data, and in particular to the ID3 and the C4.5 algorithms, created by Quinlan. We tried to determine the degree of efficiency achieved by the TDIDT family’s algorithms when applied in data mining to generate valid models of the data in classification problems with the Gain of Entropy. The Informatics (Data Mining and Computational Taxonomy), is always the original objective of our researches.
publishDate 2003
dc.date.none.fl_str_mv 2003-05
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/21405
url http://sedici.unlp.edu.ar/handle/10915/21405
dc.language.none.fl_str_mv eng
language eng
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)
dc.format.none.fl_str_mv application/pdf
322-328
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1844615804130164736
score 13.070432