On the Importance of Data Representation for the Success of Text Classification
- Autores
- Cuello, Carolina Y.; Jofre Caradonna, Vanessa; Garciarena Ucelay, María José; Cagnina, Leticia
- Año de publicación
- 2022
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- Text mining approaches use natural language processing to automatically extract patterns from texts. Tasks as topic labeling, news classification, question answering, named entity recognition and sentiment analysis, usually require elaborate and effective document representations. In this context, word representation models in general, and vector-based word representations in particular, have gained increasing interest to alleviate some of the limitations that Bag of Words exhibits. In this article, we analyze the use of several vector-based word representations besides the classical ones, in a polarity analysis task on movie reviews. Experimental results show the effectiveness of more elaborate representations in comparison to Bag of Words. In particular, Concise Semantic Analysis representation seems to be very robust and effective because independently the classifier used with, the results are really good. Dimension and time of getting the representations are also showed, concluding in the efficiency of the classifiers when Concise Semantic Analysis is considered.
XIX Workshop Base de Datos y Minería de Datos (WBDMD)
Red de Universidades con Carreras en Informática - Materia
-
Ciencias Informáticas
text mining
text representations
text classification
movie reviews
sentiment analysis
polarity analysis - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-nc-sa/4.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/149536
Ver los metadatos del registro completo
id |
SEDICI_843dd84015fbc5487cde6d0c0c148d98 |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/149536 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
On the Importance of Data Representation for the Success of Text ClassificationCuello, Carolina Y.Jofre Caradonna, VanessaGarciarena Ucelay, María JoséCagnina, LeticiaCiencias Informáticastext miningtext representationstext classificationmovie reviewssentiment analysispolarity analysisText mining approaches use natural language processing to automatically extract patterns from texts. Tasks as topic labeling, news classification, question answering, named entity recognition and sentiment analysis, usually require elaborate and effective document representations. In this context, word representation models in general, and vector-based word representations in particular, have gained increasing interest to alleviate some of the limitations that Bag of Words exhibits. In this article, we analyze the use of several vector-based word representations besides the classical ones, in a polarity analysis task on movie reviews. Experimental results show the effectiveness of more elaborate representations in comparison to Bag of Words. In particular, Concise Semantic Analysis representation seems to be very robust and effective because independently the classifier used with, the results are really good. Dimension and time of getting the representations are also showed, concluding in the efficiency of the classifiers when Concise Semantic Analysis is considered.XIX Workshop Base de Datos y Minería de Datos (WBDMD)Red de Universidades con Carreras en Informática2022-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf385-393http://sedici.unlp.edu.ar/handle/10915/149536enginfo:eu-repo/semantics/altIdentifier/isbn/978-987-1364-31-2info:eu-repo/semantics/reference/hdl/10915/149102info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T11:38:22Zoai:sedici.unlp.edu.ar:10915/149536Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 11:38:22.928SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
On the Importance of Data Representation for the Success of Text Classification |
title |
On the Importance of Data Representation for the Success of Text Classification |
spellingShingle |
On the Importance of Data Representation for the Success of Text Classification Cuello, Carolina Y. Ciencias Informáticas text mining text representations text classification movie reviews sentiment analysis polarity analysis |
title_short |
On the Importance of Data Representation for the Success of Text Classification |
title_full |
On the Importance of Data Representation for the Success of Text Classification |
title_fullStr |
On the Importance of Data Representation for the Success of Text Classification |
title_full_unstemmed |
On the Importance of Data Representation for the Success of Text Classification |
title_sort |
On the Importance of Data Representation for the Success of Text Classification |
dc.creator.none.fl_str_mv |
Cuello, Carolina Y. Jofre Caradonna, Vanessa Garciarena Ucelay, María José Cagnina, Leticia |
author |
Cuello, Carolina Y. |
author_facet |
Cuello, Carolina Y. Jofre Caradonna, Vanessa Garciarena Ucelay, María José Cagnina, Leticia |
author_role |
author |
author2 |
Jofre Caradonna, Vanessa Garciarena Ucelay, María José Cagnina, Leticia |
author2_role |
author author author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas text mining text representations text classification movie reviews sentiment analysis polarity analysis |
topic |
Ciencias Informáticas text mining text representations text classification movie reviews sentiment analysis polarity analysis |
dc.description.none.fl_txt_mv |
Text mining approaches use natural language processing to automatically extract patterns from texts. Tasks as topic labeling, news classification, question answering, named entity recognition and sentiment analysis, usually require elaborate and effective document representations. In this context, word representation models in general, and vector-based word representations in particular, have gained increasing interest to alleviate some of the limitations that Bag of Words exhibits. In this article, we analyze the use of several vector-based word representations besides the classical ones, in a polarity analysis task on movie reviews. Experimental results show the effectiveness of more elaborate representations in comparison to Bag of Words. In particular, Concise Semantic Analysis representation seems to be very robust and effective because independently the classifier used with, the results are really good. Dimension and time of getting the representations are also showed, concluding in the efficiency of the classifiers when Concise Semantic Analysis is considered. XIX Workshop Base de Datos y Minería de Datos (WBDMD) Red de Universidades con Carreras en Informática |
description |
Text mining approaches use natural language processing to automatically extract patterns from texts. Tasks as topic labeling, news classification, question answering, named entity recognition and sentiment analysis, usually require elaborate and effective document representations. In this context, word representation models in general, and vector-based word representations in particular, have gained increasing interest to alleviate some of the limitations that Bag of Words exhibits. In this article, we analyze the use of several vector-based word representations besides the classical ones, in a polarity analysis task on movie reviews. Experimental results show the effectiveness of more elaborate representations in comparison to Bag of Words. In particular, Concise Semantic Analysis representation seems to be very robust and effective because independently the classifier used with, the results are really good. Dimension and time of getting the representations are also showed, concluding in the efficiency of the classifiers when Concise Semantic Analysis is considered. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-10 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/149536 |
url |
http://sedici.unlp.edu.ar/handle/10915/149536 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/isbn/978-987-1364-31-2 info:eu-repo/semantics/reference/hdl/10915/149102 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.format.none.fl_str_mv |
application/pdf 385-393 |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1844616259040182272 |
score |
13.070432 |