Set characterization-selection towards classification based on interaction index
- Autores
- Murillo, Javier; Guillaume, S.; Spetale, Flavio Ezequiel; Tapia, E.; Bulacio, P.
- Año de publicación
- 2015
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- In many real world datasets both the individual and coordinated action of features may be relevant for class identification. In this paper, a computational strategy for relevant feature selection based on the characterization of redundant or complementary features is proposed. The characterization is achieved using fuzzy measures and an interaction index computed from fuzzy measure coefficients. Fuzzy measure identification requires raw data to be turned into confidence degrees. This key step is carried out considering the distributions of feature values across all the classes. Fuzzy measure coefficients are then estimated with an improved version of the Heuristic Least Mean Squares algorithm that includes an efficient management of untouched coefficients. Then, a generalization of the Shapley index for an arbitrary number of features is used. Simulations experiments on synthetic datasets are performed to study the behavior of this generalized interaction index. For extreme datasets, containing either redundant or complementary features as well as noise, the index value is defined by mathematical formula. This result is used to motivate feature selection guidelines that take into account feature interactions. Experimental results on benchmark datasets show that the proposal allows for the design of compact, interpretable and competitive classification models.
Fil: Murillo, Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; Argentina
Fil: Guillaume, S.. Institut national de recherche en sciences et technologies pour l'environnement et l'agriculture; Francia
Fil: Spetale, Flavio Ezequiel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; Argentina
Fil: Tapia, E.. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; Argentina
Fil: Bulacio, P.. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; Argentina - Materia
-
Choquet
Subset Characterization
Hlms
Generalized Shapley Index - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/15165
Ver los metadatos del registro completo
id |
CONICETDig_0c870134ea6b7bcb075cc4baa8b03f81 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/15165 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Set characterization-selection towards classification based on interaction indexMurillo, JavierGuillaume, S.Spetale, Flavio EzequielTapia, E.Bulacio, P.ChoquetSubset CharacterizationHlmsGeneralized Shapley Indexhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1In many real world datasets both the individual and coordinated action of features may be relevant for class identification. In this paper, a computational strategy for relevant feature selection based on the characterization of redundant or complementary features is proposed. The characterization is achieved using fuzzy measures and an interaction index computed from fuzzy measure coefficients. Fuzzy measure identification requires raw data to be turned into confidence degrees. This key step is carried out considering the distributions of feature values across all the classes. Fuzzy measure coefficients are then estimated with an improved version of the Heuristic Least Mean Squares algorithm that includes an efficient management of untouched coefficients. Then, a generalization of the Shapley index for an arbitrary number of features is used. Simulations experiments on synthetic datasets are performed to study the behavior of this generalized interaction index. For extreme datasets, containing either redundant or complementary features as well as noise, the index value is defined by mathematical formula. This result is used to motivate feature selection guidelines that take into account feature interactions. Experimental results on benchmark datasets show that the proposal allows for the design of compact, interpretable and competitive classification models.Fil: Murillo, Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; ArgentinaFil: Guillaume, S.. Institut national de recherche en sciences et technologies pour l'environnement et l'agriculture; FranciaFil: Spetale, Flavio Ezequiel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; ArgentinaFil: Tapia, E.. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; ArgentinaFil: Bulacio, P.. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; ArgentinaElsevier Science2015-07info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/15165Murillo, Javier; Guillaume, S.; Spetale, Flavio Ezequiel; Tapia, E.; Bulacio, P.; Set characterization-selection towards classification based on interaction index; Elsevier Science; Fuzzy Sets And Systems; 270; 7-2015; 74-890165-0114enginfo:eu-repo/semantics/altIdentifier/doi/10.1016/j.fss.2014.09.015info:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0165011414004229info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-nd/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:56:43Zoai:ri.conicet.gov.ar:11336/15165instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:56:44.23CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Set characterization-selection towards classification based on interaction index |
title |
Set characterization-selection towards classification based on interaction index |
spellingShingle |
Set characterization-selection towards classification based on interaction index Murillo, Javier Choquet Subset Characterization Hlms Generalized Shapley Index |
title_short |
Set characterization-selection towards classification based on interaction index |
title_full |
Set characterization-selection towards classification based on interaction index |
title_fullStr |
Set characterization-selection towards classification based on interaction index |
title_full_unstemmed |
Set characterization-selection towards classification based on interaction index |
title_sort |
Set characterization-selection towards classification based on interaction index |
dc.creator.none.fl_str_mv |
Murillo, Javier Guillaume, S. Spetale, Flavio Ezequiel Tapia, E. Bulacio, P. |
author |
Murillo, Javier |
author_facet |
Murillo, Javier Guillaume, S. Spetale, Flavio Ezequiel Tapia, E. Bulacio, P. |
author_role |
author |
author2 |
Guillaume, S. Spetale, Flavio Ezequiel Tapia, E. Bulacio, P. |
author2_role |
author author author author |
dc.subject.none.fl_str_mv |
Choquet Subset Characterization Hlms Generalized Shapley Index |
topic |
Choquet Subset Characterization Hlms Generalized Shapley Index |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
In many real world datasets both the individual and coordinated action of features may be relevant for class identification. In this paper, a computational strategy for relevant feature selection based on the characterization of redundant or complementary features is proposed. The characterization is achieved using fuzzy measures and an interaction index computed from fuzzy measure coefficients. Fuzzy measure identification requires raw data to be turned into confidence degrees. This key step is carried out considering the distributions of feature values across all the classes. Fuzzy measure coefficients are then estimated with an improved version of the Heuristic Least Mean Squares algorithm that includes an efficient management of untouched coefficients. Then, a generalization of the Shapley index for an arbitrary number of features is used. Simulations experiments on synthetic datasets are performed to study the behavior of this generalized interaction index. For extreme datasets, containing either redundant or complementary features as well as noise, the index value is defined by mathematical formula. This result is used to motivate feature selection guidelines that take into account feature interactions. Experimental results on benchmark datasets show that the proposal allows for the design of compact, interpretable and competitive classification models. Fil: Murillo, Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; Argentina Fil: Guillaume, S.. Institut national de recherche en sciences et technologies pour l'environnement et l'agriculture; Francia Fil: Spetale, Flavio Ezequiel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; Argentina Fil: Tapia, E.. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; Argentina Fil: Bulacio, P.. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y Sistemas; Argentina. Universidad Nacional de Rosario; Argentina |
description |
In many real world datasets both the individual and coordinated action of features may be relevant for class identification. In this paper, a computational strategy for relevant feature selection based on the characterization of redundant or complementary features is proposed. The characterization is achieved using fuzzy measures and an interaction index computed from fuzzy measure coefficients. Fuzzy measure identification requires raw data to be turned into confidence degrees. This key step is carried out considering the distributions of feature values across all the classes. Fuzzy measure coefficients are then estimated with an improved version of the Heuristic Least Mean Squares algorithm that includes an efficient management of untouched coefficients. Then, a generalization of the Shapley index for an arbitrary number of features is used. Simulations experiments on synthetic datasets are performed to study the behavior of this generalized interaction index. For extreme datasets, containing either redundant or complementary features as well as noise, the index value is defined by mathematical formula. This result is used to motivate feature selection guidelines that take into account feature interactions. Experimental results on benchmark datasets show that the proposal allows for the design of compact, interpretable and competitive classification models. |
publishDate |
2015 |
dc.date.none.fl_str_mv |
2015-07 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/15165 Murillo, Javier; Guillaume, S.; Spetale, Flavio Ezequiel; Tapia, E.; Bulacio, P.; Set characterization-selection towards classification based on interaction index; Elsevier Science; Fuzzy Sets And Systems; 270; 7-2015; 74-89 0165-0114 |
url |
http://hdl.handle.net/11336/15165 |
identifier_str_mv |
Murillo, Javier; Guillaume, S.; Spetale, Flavio Ezequiel; Tapia, E.; Bulacio, P.; Set characterization-selection towards classification based on interaction index; Elsevier Science; Fuzzy Sets And Systems; 270; 7-2015; 74-89 0165-0114 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/doi/10.1016/j.fss.2014.09.015 info:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0165011414004229 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-nd/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Elsevier Science |
publisher.none.fl_str_mv |
Elsevier Science |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844613702401130496 |
score |
13.069144 |