On the Importance of Data Representation for the Success of Text Classification

Autores: Cuello, Carolina Y.; Jofre Caradonna, Vanessa; Garciarena Ucelay, María José; Cagnina, Leticia
Año de publicación: 2022
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: Text mining approaches use natural language processing to automatically extract patterns from texts. Tasks as topic labeling, news classification, question answering, named entity recognition and sentiment analysis, usually require elaborate and effective document representations. In this context, word representation models in general, and vector-based word representations in particular, have gained increasing interest to alleviate some of the limitations that Bag of Words exhibits. In this article, we analyze the use of several vector-based word representations besides the classical ones, in a polarity analysis task on movie reviews. Experimental results show the effectiveness of more elaborate representations in comparison to Bag of Words. In particular, Concise Semantic Analysis representation seems to be very robust and effective because independently the classifier used with, the results are really good. Dimension and time of getting the representations are also showed, concluding in the efficiency of the classifiers when Concise Semantic Analysis is considered.
XIX Workshop Base de Datos y Minería de Datos (WBDMD)
Red de Universidades con Carreras en Informática
Materia: Ciencias Informáticas
text mining
text representations
text classification
movie reviews
sentiment analysis
polarity analysis
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
Institución: Universidad Nacional de La Plata
OAI Identificador: oai:sedici.unlp.edu.ar:10915/149536

Acceder

id	SEDICI_843dd84015fbc5487cde6d0c0c148d98
oai_identifier_str	oai:sedici.unlp.edu.ar:10915/149536
network_acronym_str	SEDICI
repository_id_str	1329
network_name_str	SEDICI (UNLP)
spelling	On the Importance of Data Representation for the Success of Text ClassificationCuello, Carolina Y.Jofre Caradonna, VanessaGarciarena Ucelay, María JoséCagnina, LeticiaCiencias Informáticastext miningtext representationstext classificationmovie reviewssentiment analysispolarity analysisText mining approaches use natural language processing to automatically extract patterns from texts. Tasks as topic labeling, news classification, question answering, named entity recognition and sentiment analysis, usually require elaborate and effective document representations. In this context, word representation models in general, and vector-based word representations in particular, have gained increasing interest to alleviate some of the limitations that Bag of Words exhibits. In this article, we analyze the use of several vector-based word representations besides the classical ones, in a polarity analysis task on movie reviews. Experimental results show the effectiveness of more elaborate representations in comparison to Bag of Words. In particular, Concise Semantic Analysis representation seems to be very robust and effective because independently the classifier used with, the results are really good. Dimension and time of getting the representations are also showed, concluding in the efficiency of the classifiers when Concise Semantic Analysis is considered.XIX Workshop Base de Datos y Minería de Datos (WBDMD)Red de Universidades con Carreras en Informática2022-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf385-393http://sedici.unlp.edu.ar/handle/10915/149536enginfo:eu-repo/semantics/altIdentifier/isbn/978-987-1364-31-2info:eu-repo/semantics/reference/hdl/10915/149102info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2026-05-27T11:32:36Zoai:sedici.unlp.edu.ar:10915/149536Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292026-05-27 11:32:36.991SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv	On the Importance of Data Representation for the Success of Text Classification
title	On the Importance of Data Representation for the Success of Text Classification
spellingShingle	On the Importance of Data Representation for the Success of Text Classification Cuello, Carolina Y. Ciencias Informáticas text mining text representations text classification movie reviews sentiment analysis polarity analysis
title_short	On the Importance of Data Representation for the Success of Text Classification
title_full	On the Importance of Data Representation for the Success of Text Classification
title_fullStr	On the Importance of Data Representation for the Success of Text Classification
title_full_unstemmed	On the Importance of Data Representation for the Success of Text Classification
title_sort	On the Importance of Data Representation for the Success of Text Classification
dc.creator.none.fl_str_mv	Cuello, Carolina Y. Jofre Caradonna, Vanessa Garciarena Ucelay, María José Cagnina, Leticia
author	Cuello, Carolina Y.
author_facet	Cuello, Carolina Y. Jofre Caradonna, Vanessa Garciarena Ucelay, María José Cagnina, Leticia
author_role	author
author2	Jofre Caradonna, Vanessa Garciarena Ucelay, María José Cagnina, Leticia
author2_role	author author author
dc.subject.none.fl_str_mv	Ciencias Informáticas text mining text representations text classification movie reviews sentiment analysis polarity analysis
topic	Ciencias Informáticas text mining text representations text classification movie reviews sentiment analysis polarity analysis
dc.description.none.fl_txt_mv	Text mining approaches use natural language processing to automatically extract patterns from texts. Tasks as topic labeling, news classification, question answering, named entity recognition and sentiment analysis, usually require elaborate and effective document representations. In this context, word representation models in general, and vector-based word representations in particular, have gained increasing interest to alleviate some of the limitations that Bag of Words exhibits. In this article, we analyze the use of several vector-based word representations besides the classical ones, in a polarity analysis task on movie reviews. Experimental results show the effectiveness of more elaborate representations in comparison to Bag of Words. In particular, Concise Semantic Analysis representation seems to be very robust and effective because independently the classifier used with, the results are really good. Dimension and time of getting the representations are also showed, concluding in the efficiency of the classifiers when Concise Semantic Analysis is considered. XIX Workshop Base de Datos y Minería de Datos (WBDMD) Red de Universidades con Carreras en Informática
description	Text mining approaches use natural language processing to automatically extract patterns from texts. Tasks as topic labeling, news classification, question answering, named entity recognition and sentiment analysis, usually require elaborate and effective document representations. In this context, word representation models in general, and vector-based word representations in particular, have gained increasing interest to alleviate some of the limitations that Bag of Words exhibits. In this article, we analyze the use of several vector-based word representations besides the classical ones, in a polarity analysis task on movie reviews. Experimental results show the effectiveness of more elaborate representations in comparison to Bag of Words. In particular, Concise Semantic Analysis representation seems to be very robust and effective because independently the classifier used with, the results are really good. Dimension and time of getting the representations are also showed, concluding in the efficiency of the classifiers when Concise Semantic Analysis is considered.
publishDate	2022
dc.date.none.fl_str_mv	2022-10
dc.type.none.fl_str_mv	info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
format	conferenceObject
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://sedici.unlp.edu.ar/handle/10915/149536
url	http://sedici.unlp.edu.ar/handle/10915/149536
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/isbn/978-987-1364-31-2 info:eu-repo/semantics/reference/hdl/10915/149102
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv	application/pdf 385-393
dc.source.none.fl_str_mv	reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP
reponame_str	SEDICI (UNLP)
collection	SEDICI (UNLP)
instname_str	Universidad Nacional de La Plata
instacron_str	UNLP
institution	UNLP
repository.name.fl_str_mv	SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv	alira@sedici.unlp.edu.ar
_version_	1866371968107806720
score	13.343132

On the Importance of Data Representation for the Success of Text Classification

Publicaciones similares