Set characterization-selection towards classification based on interaction index

Autores
Murillo, Javier; Guillaume, S.; Spetale, Flavio Ezequiel; Tapia, E.; Bulacio, P.
Año de publicación
2015
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
In many real world datasets both the individual and coordinated action of features may be relevant for class identification. In this paper, a computational strategy for relevant feature selection based on the characterization of redundant or complementary features is proposed. The characterization is achieved using fuzzy measures and an interaction index computed from fuzzy measure coefficients. Fuzzy measure identification requires raw data to be turned into confidence degrees. This key step is carried out considering the distributions of feature values across all the classes. Fuzzy measure coefficients are then estimated with an improved version of the Heuristic Least Mean Squares algorithm that includes an efficient management of untouched coefficients. Then, a generalization of the Shapley index for an arbitrary number of features is used. Simulations experiments on synthetic datasets are performed to study the behavior of this generalized interaction index. For extreme datasets, containing either redundant or complementary features as well as noise, the index value is defined by mathematical formula. This result is used to motivate feature selection guidelines that take into account feature interactions. Experimental results on benchmark datasets show that the proposal allows for the design of compact, interpretable and competitive classification models.
Fil: Murillo, Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; Argentina
Fil: Guillaume, S.. Institut national de recherche en sciences et technologies pour l'environnement et l'agriculture; Francia
Fil: Spetale, Flavio Ezequiel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; Argentina
Fil: Tapia, E.. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; Argentina
Fil: Bulacio, P.. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; Argentina
Materia
Choquet
Subset Characterization
Hlms
Generalized Shapley Index
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/15165

id CONICETDig_0c870134ea6b7bcb075cc4baa8b03f81
oai_identifier_str oai:ri.conicet.gov.ar:11336/15165
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Set characterization-selection towards classification based on interaction indexMurillo, JavierGuillaume, S.Spetale, Flavio EzequielTapia, E.Bulacio, P.ChoquetSubset CharacterizationHlmsGeneralized Shapley Indexhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1In many real world datasets both the individual and coordinated action of features may be relevant for class identification. In this paper, a computational strategy for relevant feature selection based on the characterization of redundant or complementary features is proposed. The characterization is achieved using fuzzy measures and an interaction index computed from fuzzy measure coefficients. Fuzzy measure identification requires raw data to be turned into confidence degrees. This key step is carried out considering the distributions of feature values across all the classes. Fuzzy measure coefficients are then estimated with an improved version of the Heuristic Least Mean Squares algorithm that includes an efficient management of untouched coefficients. Then, a generalization of the Shapley index for an arbitrary number of features is used. Simulations experiments on synthetic datasets are performed to study the behavior of this generalized interaction index. For extreme datasets, containing either redundant or complementary features as well as noise, the index value is defined by mathematical formula. This result is used to motivate feature selection guidelines that take into account feature interactions. Experimental results on benchmark datasets show that the proposal allows for the design of compact, interpretable and competitive classification models.Fil: Murillo, Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; ArgentinaFil: Guillaume, S.. Institut national de recherche en sciences et technologies pour l'environnement et l'agriculture; FranciaFil: Spetale, Flavio Ezequiel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; ArgentinaFil: Tapia, E.. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; ArgentinaFil: Bulacio, P.. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; ArgentinaElsevier Science2015-07info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/15165Murillo, Javier; Guillaume, S.; Spetale, Flavio Ezequiel; Tapia, E.; Bulacio, P.; Set characterization-selection towards classification based on interaction index; Elsevier Science; Fuzzy Sets And Systems; 270; 7-2015; 74-890165-0114enginfo:eu-repo/semantics/altIdentifier/doi/10.1016/j.fss.2014.09.015info:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0165011414004229info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-nd/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:56:43Zoai:ri.conicet.gov.ar:11336/15165instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:56:44.23CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Set characterization-selection towards classification based on interaction index
title Set characterization-selection towards classification based on interaction index
spellingShingle Set characterization-selection towards classification based on interaction index
Murillo, Javier
Choquet
Subset Characterization
Hlms
Generalized Shapley Index
title_short Set characterization-selection towards classification based on interaction index
title_full Set characterization-selection towards classification based on interaction index
title_fullStr Set characterization-selection towards classification based on interaction index
title_full_unstemmed Set characterization-selection towards classification based on interaction index
title_sort Set characterization-selection towards classification based on interaction index
dc.creator.none.fl_str_mv Murillo, Javier
Guillaume, S.
Spetale, Flavio Ezequiel
Tapia, E.
Bulacio, P.
author Murillo, Javier
author_facet Murillo, Javier
Guillaume, S.
Spetale, Flavio Ezequiel
Tapia, E.
Bulacio, P.
author_role author
author2 Guillaume, S.
Spetale, Flavio Ezequiel
Tapia, E.
Bulacio, P.
author2_role author
author
author
author
dc.subject.none.fl_str_mv Choquet
Subset Characterization
Hlms
Generalized Shapley Index
topic Choquet
Subset Characterization
Hlms
Generalized Shapley Index
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv In many real world datasets both the individual and coordinated action of features may be relevant for class identification. In this paper, a computational strategy for relevant feature selection based on the characterization of redundant or complementary features is proposed. The characterization is achieved using fuzzy measures and an interaction index computed from fuzzy measure coefficients. Fuzzy measure identification requires raw data to be turned into confidence degrees. This key step is carried out considering the distributions of feature values across all the classes. Fuzzy measure coefficients are then estimated with an improved version of the Heuristic Least Mean Squares algorithm that includes an efficient management of untouched coefficients. Then, a generalization of the Shapley index for an arbitrary number of features is used. Simulations experiments on synthetic datasets are performed to study the behavior of this generalized interaction index. For extreme datasets, containing either redundant or complementary features as well as noise, the index value is defined by mathematical formula. This result is used to motivate feature selection guidelines that take into account feature interactions. Experimental results on benchmark datasets show that the proposal allows for the design of compact, interpretable and competitive classification models.
Fil: Murillo, Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; Argentina
Fil: Guillaume, S.. Institut national de recherche en sciences et technologies pour l'environnement et l'agriculture; Francia
Fil: Spetale, Flavio Ezequiel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; Argentina
Fil: Tapia, E.. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; Argentina
Fil: Bulacio, P.. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; Argentina
description In many real world datasets both the individual and coordinated action of features may be relevant for class identification. In this paper, a computational strategy for relevant feature selection based on the characterization of redundant or complementary features is proposed. The characterization is achieved using fuzzy measures and an interaction index computed from fuzzy measure coefficients. Fuzzy measure identification requires raw data to be turned into confidence degrees. This key step is carried out considering the distributions of feature values across all the classes. Fuzzy measure coefficients are then estimated with an improved version of the Heuristic Least Mean Squares algorithm that includes an efficient management of untouched coefficients. Then, a generalization of the Shapley index for an arbitrary number of features is used. Simulations experiments on synthetic datasets are performed to study the behavior of this generalized interaction index. For extreme datasets, containing either redundant or complementary features as well as noise, the index value is defined by mathematical formula. This result is used to motivate feature selection guidelines that take into account feature interactions. Experimental results on benchmark datasets show that the proposal allows for the design of compact, interpretable and competitive classification models.
publishDate 2015
dc.date.none.fl_str_mv 2015-07
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/15165
Murillo, Javier; Guillaume, S.; Spetale, Flavio Ezequiel; Tapia, E.; Bulacio, P.; Set characterization-selection towards classification based on interaction index; Elsevier Science; Fuzzy Sets And Systems; 270; 7-2015; 74-89
0165-0114
url http://hdl.handle.net/11336/15165
identifier_str_mv Murillo, Javier; Guillaume, S.; Spetale, Flavio Ezequiel; Tapia, E.; Bulacio, P.; Set characterization-selection towards classification based on interaction index; Elsevier Science; Fuzzy Sets And Systems; 270; 7-2015; 74-89
0165-0114
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/10.1016/j.fss.2014.09.015
info:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0165011414004229
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Elsevier Science
publisher.none.fl_str_mv Elsevier Science
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844613702401130496
score 13.069144