On the statistical comparison of feature selection methods and the role of experts: the case of Las Vegas strip
- Autores
- Barraza, Néstor Rubén; Moreno, Antonio A.
- Año de publicación
- 2020
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- A statistical comparison of feature selection methods is performed. Feature selection is an important issue in Data Mining and Data Science, and a comparison of the results obtained from different methods is hard to be performed. Then, the evaluation of metrics and ways of comparisons is an important matter of study. Our study is performed on a real dataset previously analyzed in the literature containing a small number of records, drawing the attention on the conclusions to be applied where poor statistical confidence levels of significance can be obtained because of a relative low number of samples are present. The use of inter rater agreement coefficients is introduced as a novel approach extending a previous study. Boruta and tree-based methodologies perform rather well even in small data as it is shown. Our metrics can be used to guide the expert opinion in order to take the final decision. This work extends the results obtained in a previous analysis performed on the mentioned dataset.
Sociedad Argentina de Informática - Materia
-
Ciencias Informáticas
Big data
Feature selection
Wrapper
Filtered
Lasso
Expert role - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-nc-sa/3.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/116426
Ver los metadatos del registro completo
id |
SEDICI_0e2193d361cf46d4821678aaf7cdbfdb |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/116426 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
On the statistical comparison of feature selection methods and the role of experts: the case of Las Vegas stripBarraza, Néstor RubénMoreno, Antonio A.Ciencias InformáticasBig dataFeature selectionWrapperFilteredLassoExpert roleA statistical comparison of feature selection methods is performed. Feature selection is an important issue in Data Mining and Data Science, and a comparison of the results obtained from different methods is hard to be performed. Then, the evaluation of metrics and ways of comparisons is an important matter of study. Our study is performed on a real dataset previously analyzed in the literature containing a small number of records, drawing the attention on the conclusions to be applied where poor statistical confidence levels of significance can be obtained because of a relative low number of samples are present. The use of inter rater agreement coefficients is introduced as a novel approach extending a previous study. Boruta and tree-based methodologies perform rather well even in small data as it is shown. Our metrics can be used to guide the expert opinion in order to take the final decision. This work extends the results obtained in a previous analysis performed on the mentioned dataset.Sociedad Argentina de Informática2020-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf15-27http://sedici.unlp.edu.ar/handle/10915/116426enginfo:eu-repo/semantics/altIdentifier/url/http://49jaiio.sadio.org.ar/pdfs/asai/ASAI-02.pdfinfo:eu-repo/semantics/altIdentifier/issn/2451-7585info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/3.0/Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T11:27:14Zoai:sedici.unlp.edu.ar:10915/116426Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 11:27:14.69SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
On the statistical comparison of feature selection methods and the role of experts: the case of Las Vegas strip |
title |
On the statistical comparison of feature selection methods and the role of experts: the case of Las Vegas strip |
spellingShingle |
On the statistical comparison of feature selection methods and the role of experts: the case of Las Vegas strip Barraza, Néstor Rubén Ciencias Informáticas Big data Feature selection Wrapper Filtered Lasso Expert role |
title_short |
On the statistical comparison of feature selection methods and the role of experts: the case of Las Vegas strip |
title_full |
On the statistical comparison of feature selection methods and the role of experts: the case of Las Vegas strip |
title_fullStr |
On the statistical comparison of feature selection methods and the role of experts: the case of Las Vegas strip |
title_full_unstemmed |
On the statistical comparison of feature selection methods and the role of experts: the case of Las Vegas strip |
title_sort |
On the statistical comparison of feature selection methods and the role of experts: the case of Las Vegas strip |
dc.creator.none.fl_str_mv |
Barraza, Néstor Rubén Moreno, Antonio A. |
author |
Barraza, Néstor Rubén |
author_facet |
Barraza, Néstor Rubén Moreno, Antonio A. |
author_role |
author |
author2 |
Moreno, Antonio A. |
author2_role |
author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas Big data Feature selection Wrapper Filtered Lasso Expert role |
topic |
Ciencias Informáticas Big data Feature selection Wrapper Filtered Lasso Expert role |
dc.description.none.fl_txt_mv |
A statistical comparison of feature selection methods is performed. Feature selection is an important issue in Data Mining and Data Science, and a comparison of the results obtained from different methods is hard to be performed. Then, the evaluation of metrics and ways of comparisons is an important matter of study. Our study is performed on a real dataset previously analyzed in the literature containing a small number of records, drawing the attention on the conclusions to be applied where poor statistical confidence levels of significance can be obtained because of a relative low number of samples are present. The use of inter rater agreement coefficients is introduced as a novel approach extending a previous study. Boruta and tree-based methodologies perform rather well even in small data as it is shown. Our metrics can be used to guide the expert opinion in order to take the final decision. This work extends the results obtained in a previous analysis performed on the mentioned dataset. Sociedad Argentina de Informática |
description |
A statistical comparison of feature selection methods is performed. Feature selection is an important issue in Data Mining and Data Science, and a comparison of the results obtained from different methods is hard to be performed. Then, the evaluation of metrics and ways of comparisons is an important matter of study. Our study is performed on a real dataset previously analyzed in the literature containing a small number of records, drawing the attention on the conclusions to be applied where poor statistical confidence levels of significance can be obtained because of a relative low number of samples are present. The use of inter rater agreement coefficients is introduced as a novel approach extending a previous study. Boruta and tree-based methodologies perform rather well even in small data as it is shown. Our metrics can be used to guide the expert opinion in order to take the final decision. This work extends the results obtained in a previous analysis performed on the mentioned dataset. |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020-10 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/116426 |
url |
http://sedici.unlp.edu.ar/handle/10915/116426 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/http://49jaiio.sadio.org.ar/pdfs/asai/ASAI-02.pdf info:eu-repo/semantics/altIdentifier/issn/2451-7585 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/3.0/ Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc-sa/3.0/ Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) |
dc.format.none.fl_str_mv |
application/pdf 15-27 |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1844616150188556288 |
score |
13.070432 |