On the statistical comparison of feature selection methods and the role of experts: the case of Las Vegas strip

Autores
Barraza, Néstor Rubén; Moreno, Antonio A.
Año de publicación
2020
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
A statistical comparison of feature selection methods is performed. Feature selection is an important issue in Data Mining and Data Science, and a comparison of the results obtained from different methods is hard to be performed. Then, the evaluation of metrics and ways of comparisons is an important matter of study. Our study is performed on a real dataset previously analyzed in the literature containing a small number of records, drawing the attention on the conclusions to be applied where poor statistical confidence levels of significance can be obtained because of a relative low number of samples are present. The use of inter rater agreement coefficients is introduced as a novel approach extending a previous study. Boruta and tree-based methodologies perform rather well even in small data as it is shown. Our metrics can be used to guide the expert opinion in order to take the final decision. This work extends the results obtained in a previous analysis performed on the mentioned dataset.
Sociedad Argentina de Informática
Materia
Ciencias Informáticas
Big data
Feature selection
Wrapper
Filtered
Lasso
Expert role
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-sa/3.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/116426

id SEDICI_0e2193d361cf46d4821678aaf7cdbfdb
oai_identifier_str oai:sedici.unlp.edu.ar:10915/116426
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling On the statistical comparison of feature selection methods and the role of experts: the case of Las Vegas stripBarraza, Néstor RubénMoreno, Antonio A.Ciencias InformáticasBig dataFeature selectionWrapperFilteredLassoExpert roleA statistical comparison of feature selection methods is performed. Feature selection is an important issue in Data Mining and Data Science, and a comparison of the results obtained from different methods is hard to be performed. Then, the evaluation of metrics and ways of comparisons is an important matter of study. Our study is performed on a real dataset previously analyzed in the literature containing a small number of records, drawing the attention on the conclusions to be applied where poor statistical confidence levels of significance can be obtained because of a relative low number of samples are present. The use of inter rater agreement coefficients is introduced as a novel approach extending a previous study. Boruta and tree-based methodologies perform rather well even in small data as it is shown. Our metrics can be used to guide the expert opinion in order to take the final decision. This work extends the results obtained in a previous analysis performed on the mentioned dataset.Sociedad Argentina de Informática2020-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf15-27http://sedici.unlp.edu.ar/handle/10915/116426enginfo:eu-repo/semantics/altIdentifier/url/http://49jaiio.sadio.org.ar/pdfs/asai/ASAI-02.pdfinfo:eu-repo/semantics/altIdentifier/issn/2451-7585info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/3.0/Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T11:27:14Zoai:sedici.unlp.edu.ar:10915/116426Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 11:27:14.69SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv On the statistical comparison of feature selection methods and the role of experts: the case of Las Vegas strip
title On the statistical comparison of feature selection methods and the role of experts: the case of Las Vegas strip
spellingShingle On the statistical comparison of feature selection methods and the role of experts: the case of Las Vegas strip
Barraza, Néstor Rubén
Ciencias Informáticas
Big data
Feature selection
Wrapper
Filtered
Lasso
Expert role
title_short On the statistical comparison of feature selection methods and the role of experts: the case of Las Vegas strip
title_full On the statistical comparison of feature selection methods and the role of experts: the case of Las Vegas strip
title_fullStr On the statistical comparison of feature selection methods and the role of experts: the case of Las Vegas strip
title_full_unstemmed On the statistical comparison of feature selection methods and the role of experts: the case of Las Vegas strip
title_sort On the statistical comparison of feature selection methods and the role of experts: the case of Las Vegas strip
dc.creator.none.fl_str_mv Barraza, Néstor Rubén
Moreno, Antonio A.
author Barraza, Néstor Rubén
author_facet Barraza, Néstor Rubén
Moreno, Antonio A.
author_role author
author2 Moreno, Antonio A.
author2_role author
dc.subject.none.fl_str_mv Ciencias Informáticas
Big data
Feature selection
Wrapper
Filtered
Lasso
Expert role
topic Ciencias Informáticas
Big data
Feature selection
Wrapper
Filtered
Lasso
Expert role
dc.description.none.fl_txt_mv A statistical comparison of feature selection methods is performed. Feature selection is an important issue in Data Mining and Data Science, and a comparison of the results obtained from different methods is hard to be performed. Then, the evaluation of metrics and ways of comparisons is an important matter of study. Our study is performed on a real dataset previously analyzed in the literature containing a small number of records, drawing the attention on the conclusions to be applied where poor statistical confidence levels of significance can be obtained because of a relative low number of samples are present. The use of inter rater agreement coefficients is introduced as a novel approach extending a previous study. Boruta and tree-based methodologies perform rather well even in small data as it is shown. Our metrics can be used to guide the expert opinion in order to take the final decision. This work extends the results obtained in a previous analysis performed on the mentioned dataset.
Sociedad Argentina de Informática
description A statistical comparison of feature selection methods is performed. Feature selection is an important issue in Data Mining and Data Science, and a comparison of the results obtained from different methods is hard to be performed. Then, the evaluation of metrics and ways of comparisons is an important matter of study. Our study is performed on a real dataset previously analyzed in the literature containing a small number of records, drawing the attention on the conclusions to be applied where poor statistical confidence levels of significance can be obtained because of a relative low number of samples are present. The use of inter rater agreement coefficients is introduced as a novel approach extending a previous study. Boruta and tree-based methodologies perform rather well even in small data as it is shown. Our metrics can be used to guide the expert opinion in order to take the final decision. This work extends the results obtained in a previous analysis performed on the mentioned dataset.
publishDate 2020
dc.date.none.fl_str_mv 2020-10
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/116426
url http://sedici.unlp.edu.ar/handle/10915/116426
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/http://49jaiio.sadio.org.ar/pdfs/asai/ASAI-02.pdf
info:eu-repo/semantics/altIdentifier/issn/2451-7585
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/3.0/
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/3.0/
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
dc.format.none.fl_str_mv application/pdf
15-27
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1844616150188556288
score 13.070432