On the assessment of personality traits by using text mining techniques

Autores
Montenegro, Luis; Sapino, Maximiliano; Ferretti, Edgardo; Cagnina, Leticia
Año de publicación
2023
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
This paper reports a complete experience of the Knowledge Discovery in Databases process to solve a personality trait assessment problem using text mining techniques. Given that this work is part of an interdisciplinary study between researchers from the fields of Computer Science and Psychology, in this first approach, four simple predictive algorithms were evaluated; namely: Multinomial Naive Bayes, Logistic Regression, Support Vector Machines and Decision Trees. Moreover, given the nature of the problem faced, where one person may present more than one personality trait, but not necessary all of them, it was modeled by three different classification tasks: viz. binary, multiclass and multilabel. Besides, data augmentation was used as a useful technique to improve the performance of all the classification approaches evaluated. Particularly, binary classification was the approach which took more advantage of using this technique by improving its performance on average by 13% compared to the original dataset. For three out of the five personality traits studied, it achieves weighted-F1 scores above 0.75 and in particular the highest score of 0.88 was achieved for the Responsibility trait.
Red de Universidades con Carreras en Informática
Materia
Ciencias Informáticas
Personality Traits
Big Five Factors
Knowledge Discovery in Databases
Text Mining
Data Augmentation
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/164884

id SEDICI_1c31c479805076573003d69aa092dfa1
oai_identifier_str oai:sedici.unlp.edu.ar:10915/164884
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling On the assessment of personality traits by using text mining techniquesMontenegro, LuisSapino, MaximilianoFerretti, EdgardoCagnina, LeticiaCiencias InformáticasPersonality TraitsBig Five FactorsKnowledge Discovery in DatabasesText MiningData AugmentationThis paper reports a complete experience of the Knowledge Discovery in Databases process to solve a personality trait assessment problem using text mining techniques. Given that this work is part of an interdisciplinary study between researchers from the fields of Computer Science and Psychology, in this first approach, four simple predictive algorithms were evaluated; namely: Multinomial Naive Bayes, Logistic Regression, Support Vector Machines and Decision Trees. Moreover, given the nature of the problem faced, where one person may present more than one personality trait, but not necessary all of them, it was modeled by three different classification tasks: viz. binary, multiclass and multilabel. Besides, data augmentation was used as a useful technique to improve the performance of all the classification approaches evaluated. Particularly, binary classification was the approach which took more advantage of using this technique by improving its performance on average by 13% compared to the original dataset. For three out of the five personality traits studied, it achieves weighted-F1 scores above 0.75 and in particular the highest score of 0.88 was achieved for the Responsibility trait.Red de Universidades con Carreras en Informática2023-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf114-123http://sedici.unlp.edu.ar/handle/10915/164884enginfo:eu-repo/semantics/altIdentifier/isbn/978-987-9285-51-0info:eu-repo/semantics/reference/url/https://sedici.unlp.edu.ar/handle/10915/163107info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-03T11:15:39Zoai:sedici.unlp.edu.ar:10915/164884Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-03 11:15:40.098SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv On the assessment of personality traits by using text mining techniques
title On the assessment of personality traits by using text mining techniques
spellingShingle On the assessment of personality traits by using text mining techniques
Montenegro, Luis
Ciencias Informáticas
Personality Traits
Big Five Factors
Knowledge Discovery in Databases
Text Mining
Data Augmentation
title_short On the assessment of personality traits by using text mining techniques
title_full On the assessment of personality traits by using text mining techniques
title_fullStr On the assessment of personality traits by using text mining techniques
title_full_unstemmed On the assessment of personality traits by using text mining techniques
title_sort On the assessment of personality traits by using text mining techniques
dc.creator.none.fl_str_mv Montenegro, Luis
Sapino, Maximiliano
Ferretti, Edgardo
Cagnina, Leticia
author Montenegro, Luis
author_facet Montenegro, Luis
Sapino, Maximiliano
Ferretti, Edgardo
Cagnina, Leticia
author_role author
author2 Sapino, Maximiliano
Ferretti, Edgardo
Cagnina, Leticia
author2_role author
author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
Personality Traits
Big Five Factors
Knowledge Discovery in Databases
Text Mining
Data Augmentation
topic Ciencias Informáticas
Personality Traits
Big Five Factors
Knowledge Discovery in Databases
Text Mining
Data Augmentation
dc.description.none.fl_txt_mv This paper reports a complete experience of the Knowledge Discovery in Databases process to solve a personality trait assessment problem using text mining techniques. Given that this work is part of an interdisciplinary study between researchers from the fields of Computer Science and Psychology, in this first approach, four simple predictive algorithms were evaluated; namely: Multinomial Naive Bayes, Logistic Regression, Support Vector Machines and Decision Trees. Moreover, given the nature of the problem faced, where one person may present more than one personality trait, but not necessary all of them, it was modeled by three different classification tasks: viz. binary, multiclass and multilabel. Besides, data augmentation was used as a useful technique to improve the performance of all the classification approaches evaluated. Particularly, binary classification was the approach which took more advantage of using this technique by improving its performance on average by 13% compared to the original dataset. For three out of the five personality traits studied, it achieves weighted-F1 scores above 0.75 and in particular the highest score of 0.88 was achieved for the Responsibility trait.
Red de Universidades con Carreras en Informática
description This paper reports a complete experience of the Knowledge Discovery in Databases process to solve a personality trait assessment problem using text mining techniques. Given that this work is part of an interdisciplinary study between researchers from the fields of Computer Science and Psychology, in this first approach, four simple predictive algorithms were evaluated; namely: Multinomial Naive Bayes, Logistic Regression, Support Vector Machines and Decision Trees. Moreover, given the nature of the problem faced, where one person may present more than one personality trait, but not necessary all of them, it was modeled by three different classification tasks: viz. binary, multiclass and multilabel. Besides, data augmentation was used as a useful technique to improve the performance of all the classification approaches evaluated. Particularly, binary classification was the approach which took more advantage of using this technique by improving its performance on average by 13% compared to the original dataset. For three out of the five personality traits studied, it achieves weighted-F1 scores above 0.75 and in particular the highest score of 0.88 was achieved for the Responsibility trait.
publishDate 2023
dc.date.none.fl_str_mv 2023-10
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/164884
url http://sedici.unlp.edu.ar/handle/10915/164884
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/isbn/978-987-9285-51-0
info:eu-repo/semantics/reference/url/https://sedici.unlp.edu.ar/handle/10915/163107
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv application/pdf
114-123
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1842260660586545152
score 13.13397