Prediction of user retweets based on social neighborhood information and topic modelling

Autores
Celayes, Pablo Gabriel; Domínguez, Martín Ariel
Año de publicación
2017
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Ponencia presentada en la 16th Mexican International Conference on Artificial Intelligence. October 23 to 28, Ensenada, Baja California, Mexico.
Fil: Celayes, Pablo Gabriel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.
Fil: Domínguez, Martín Ariel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.
Twitter and other social networks have become a fundamental source of information and a powerful tool to spread ideas and opinions. A crucial step in understanding the mechanisms that drive information diffusion in Twitter, is to study the influence of the social neighborhood of a user in the construction of her retweeting preferences. In particular, to what extent can the preferences of a user be predicted given the preferences of her neighborhood.We build our own sample graph of Twitter users and study the problem of pre- dicting retweets from a given user based on the retweeting behavior occurring in her second-degree social neighborhood (followed and followed-by-followed). We manage to train and evaluate user-centered binary classification models that predict retweets with an average F 1 score of 87.6%, based purely on social in- formation, that is, without analyzing the content of the tweets.For users getting low scores with such models (on a tuning dataset), we improve the results by adding features extracted from the content of tweets. To do so, we apply a Natural Language Processing (NLP) pipeline including a Twitter-specific adaptation of the Latent Dirichlet Allocation (LDA) probabilistic topic model.
Fil: Celayes, Pablo Gabriel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.
Fil: Domínguez, Martín Ariel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.
Otras Ciencias de la Computación e Información
Materia
Machine learning
Social networks
Topic modelling
Natural language processing
Nivel de accesibilidad
acceso abierto
Condiciones de uso
Repositorio
Repositorio Digital Universitario (UNC)
Institución
Universidad Nacional de Córdoba
OAI Identificador
oai:rdu.unc.edu.ar:11086/552488

id RDUUNC_2358afae318f97dc316b59bdb1090fbb
oai_identifier_str oai:rdu.unc.edu.ar:11086/552488
network_acronym_str RDUUNC
repository_id_str 2572
network_name_str Repositorio Digital Universitario (UNC)
spelling Prediction of user retweets based on social neighborhood information and topic modellingCelayes, Pablo GabrielDomínguez, Martín ArielMachine learningSocial networksTopic modellingNatural language processingPonencia presentada en la 16th Mexican International Conference on Artificial Intelligence. October 23 to 28, Ensenada, Baja California, Mexico.Fil: Celayes, Pablo Gabriel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.Fil: Domínguez, Martín Ariel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.Twitter and other social networks have become a fundamental source of information and a powerful tool to spread ideas and opinions. A crucial step in understanding the mechanisms that drive information diffusion in Twitter, is to study the influence of the social neighborhood of a user in the construction of her retweeting preferences. In particular, to what extent can the preferences of a user be predicted given the preferences of her neighborhood.We build our own sample graph of Twitter users and study the problem of pre- dicting retweets from a given user based on the retweeting behavior occurring in her second-degree social neighborhood (followed and followed-by-followed). We manage to train and evaluate user-centered binary classification models that predict retweets with an average F 1 score of 87.6%, based purely on social in- formation, that is, without analyzing the content of the tweets.For users getting low scores with such models (on a tuning dataset), we improve the results by adding features extracted from the content of tweets. To do so, we apply a Natural Language Processing (NLP) pipeline including a Twitter-specific adaptation of the Latent Dirichlet Allocation (LDA) probabilistic topic model.Fil: Celayes, Pablo Gabriel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.Fil: Domínguez, Martín Ariel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.Otras Ciencias de la Computación e Información2017info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttp://hdl.handle.net/11086/552488enginfo:eu-repo/semantics/openAccessreponame:Repositorio Digital Universitario (UNC)instname:Universidad Nacional de Córdobainstacron:UNC2025-10-16T09:31:35Zoai:rdu.unc.edu.ar:11086/552488Institucionalhttps://rdu.unc.edu.ar/Universidad públicaNo correspondehttp://rdu.unc.edu.ar/oai/snrdoca.unc@gmail.comArgentinaNo correspondeNo correspondeNo correspondeopendoar:25722025-10-16 09:31:35.502Repositorio Digital Universitario (UNC) - Universidad Nacional de Córdobafalse
dc.title.none.fl_str_mv Prediction of user retweets based on social neighborhood information and topic modelling
title Prediction of user retweets based on social neighborhood information and topic modelling
spellingShingle Prediction of user retweets based on social neighborhood information and topic modelling
Celayes, Pablo Gabriel
Machine learning
Social networks
Topic modelling
Natural language processing
title_short Prediction of user retweets based on social neighborhood information and topic modelling
title_full Prediction of user retweets based on social neighborhood information and topic modelling
title_fullStr Prediction of user retweets based on social neighborhood information and topic modelling
title_full_unstemmed Prediction of user retweets based on social neighborhood information and topic modelling
title_sort Prediction of user retweets based on social neighborhood information and topic modelling
dc.creator.none.fl_str_mv Celayes, Pablo Gabriel
Domínguez, Martín Ariel
author Celayes, Pablo Gabriel
author_facet Celayes, Pablo Gabriel
Domínguez, Martín Ariel
author_role author
author2 Domínguez, Martín Ariel
author2_role author
dc.subject.none.fl_str_mv Machine learning
Social networks
Topic modelling
Natural language processing
topic Machine learning
Social networks
Topic modelling
Natural language processing
dc.description.none.fl_txt_mv Ponencia presentada en la 16th Mexican International Conference on Artificial Intelligence. October 23 to 28, Ensenada, Baja California, Mexico.
Fil: Celayes, Pablo Gabriel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.
Fil: Domínguez, Martín Ariel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.
Twitter and other social networks have become a fundamental source of information and a powerful tool to spread ideas and opinions. A crucial step in understanding the mechanisms that drive information diffusion in Twitter, is to study the influence of the social neighborhood of a user in the construction of her retweeting preferences. In particular, to what extent can the preferences of a user be predicted given the preferences of her neighborhood.We build our own sample graph of Twitter users and study the problem of pre- dicting retweets from a given user based on the retweeting behavior occurring in her second-degree social neighborhood (followed and followed-by-followed). We manage to train and evaluate user-centered binary classification models that predict retweets with an average F 1 score of 87.6%, based purely on social in- formation, that is, without analyzing the content of the tweets.For users getting low scores with such models (on a tuning dataset), we improve the results by adding features extracted from the content of tweets. To do so, we apply a Natural Language Processing (NLP) pipeline including a Twitter-specific adaptation of the Latent Dirichlet Allocation (LDA) probabilistic topic model.
Fil: Celayes, Pablo Gabriel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.
Fil: Domínguez, Martín Ariel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.
Otras Ciencias de la Computación e Información
description Ponencia presentada en la 16th Mexican International Conference on Artificial Intelligence. October 23 to 28, Ensenada, Baja California, Mexico.
publishDate 2017
dc.date.none.fl_str_mv 2017
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11086/552488
url http://hdl.handle.net/11086/552488
dc.language.none.fl_str_mv eng
language eng
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:Repositorio Digital Universitario (UNC)
instname:Universidad Nacional de Córdoba
instacron:UNC
reponame_str Repositorio Digital Universitario (UNC)
collection Repositorio Digital Universitario (UNC)
instname_str Universidad Nacional de Córdoba
instacron_str UNC
institution UNC
repository.name.fl_str_mv Repositorio Digital Universitario (UNC) - Universidad Nacional de Córdoba
repository.mail.fl_str_mv oca.unc@gmail.com
_version_ 1846143402730586112
score 12.712165