Prediction of user retweets based on social neighborhood information and topic modelling
- Autores
- Celayes, Pablo Gabriel; Domínguez, Martín Ariel
- Año de publicación
- 2017
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- Ponencia presentada en la 16th Mexican International Conference on Artificial Intelligence. October 23 to 28, Ensenada, Baja California, Mexico.
Fil: Celayes, Pablo Gabriel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.
Fil: Domínguez, Martín Ariel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.
Twitter and other social networks have become a fundamental source of information and a powerful tool to spread ideas and opinions. A crucial step in understanding the mechanisms that drive information diffusion in Twitter, is to study the influence of the social neighborhood of a user in the construction of her retweeting preferences. In particular, to what extent can the preferences of a user be predicted given the preferences of her neighborhood.We build our own sample graph of Twitter users and study the problem of pre- dicting retweets from a given user based on the retweeting behavior occurring in her second-degree social neighborhood (followed and followed-by-followed). We manage to train and evaluate user-centered binary classification models that predict retweets with an average F 1 score of 87.6%, based purely on social in- formation, that is, without analyzing the content of the tweets.For users getting low scores with such models (on a tuning dataset), we improve the results by adding features extracted from the content of tweets. To do so, we apply a Natural Language Processing (NLP) pipeline including a Twitter-specific adaptation of the Latent Dirichlet Allocation (LDA) probabilistic topic model.
Fil: Celayes, Pablo Gabriel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.
Fil: Domínguez, Martín Ariel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.
Otras Ciencias de la Computación e Información - Materia
-
Machine learning
Social networks
Topic modelling
Natural language processing - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- Repositorio
- Institución
- Universidad Nacional de Córdoba
- OAI Identificador
- oai:rdu.unc.edu.ar:11086/552488
Ver los metadatos del registro completo
id |
RDUUNC_2358afae318f97dc316b59bdb1090fbb |
---|---|
oai_identifier_str |
oai:rdu.unc.edu.ar:11086/552488 |
network_acronym_str |
RDUUNC |
repository_id_str |
2572 |
network_name_str |
Repositorio Digital Universitario (UNC) |
spelling |
Prediction of user retweets based on social neighborhood information and topic modellingCelayes, Pablo GabrielDomínguez, Martín ArielMachine learningSocial networksTopic modellingNatural language processingPonencia presentada en la 16th Mexican International Conference on Artificial Intelligence. October 23 to 28, Ensenada, Baja California, Mexico.Fil: Celayes, Pablo Gabriel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.Fil: Domínguez, Martín Ariel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.Twitter and other social networks have become a fundamental source of information and a powerful tool to spread ideas and opinions. A crucial step in understanding the mechanisms that drive information diffusion in Twitter, is to study the influence of the social neighborhood of a user in the construction of her retweeting preferences. In particular, to what extent can the preferences of a user be predicted given the preferences of her neighborhood.We build our own sample graph of Twitter users and study the problem of pre- dicting retweets from a given user based on the retweeting behavior occurring in her second-degree social neighborhood (followed and followed-by-followed). We manage to train and evaluate user-centered binary classification models that predict retweets with an average F 1 score of 87.6%, based purely on social in- formation, that is, without analyzing the content of the tweets.For users getting low scores with such models (on a tuning dataset), we improve the results by adding features extracted from the content of tweets. To do so, we apply a Natural Language Processing (NLP) pipeline including a Twitter-specific adaptation of the Latent Dirichlet Allocation (LDA) probabilistic topic model.Fil: Celayes, Pablo Gabriel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.Fil: Domínguez, Martín Ariel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.Otras Ciencias de la Computación e Información2017info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttp://hdl.handle.net/11086/552488enginfo:eu-repo/semantics/openAccessreponame:Repositorio Digital Universitario (UNC)instname:Universidad Nacional de Córdobainstacron:UNC2025-10-16T09:31:35Zoai:rdu.unc.edu.ar:11086/552488Institucionalhttps://rdu.unc.edu.ar/Universidad públicaNo correspondehttp://rdu.unc.edu.ar/oai/snrdoca.unc@gmail.comArgentinaNo correspondeNo correspondeNo correspondeopendoar:25722025-10-16 09:31:35.502Repositorio Digital Universitario (UNC) - Universidad Nacional de Córdobafalse |
dc.title.none.fl_str_mv |
Prediction of user retweets based on social neighborhood information and topic modelling |
title |
Prediction of user retweets based on social neighborhood information and topic modelling |
spellingShingle |
Prediction of user retweets based on social neighborhood information and topic modelling Celayes, Pablo Gabriel Machine learning Social networks Topic modelling Natural language processing |
title_short |
Prediction of user retweets based on social neighborhood information and topic modelling |
title_full |
Prediction of user retweets based on social neighborhood information and topic modelling |
title_fullStr |
Prediction of user retweets based on social neighborhood information and topic modelling |
title_full_unstemmed |
Prediction of user retweets based on social neighborhood information and topic modelling |
title_sort |
Prediction of user retweets based on social neighborhood information and topic modelling |
dc.creator.none.fl_str_mv |
Celayes, Pablo Gabriel Domínguez, Martín Ariel |
author |
Celayes, Pablo Gabriel |
author_facet |
Celayes, Pablo Gabriel Domínguez, Martín Ariel |
author_role |
author |
author2 |
Domínguez, Martín Ariel |
author2_role |
author |
dc.subject.none.fl_str_mv |
Machine learning Social networks Topic modelling Natural language processing |
topic |
Machine learning Social networks Topic modelling Natural language processing |
dc.description.none.fl_txt_mv |
Ponencia presentada en la 16th Mexican International Conference on Artificial Intelligence. October 23 to 28, Ensenada, Baja California, Mexico. Fil: Celayes, Pablo Gabriel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina. Fil: Domínguez, Martín Ariel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina. Twitter and other social networks have become a fundamental source of information and a powerful tool to spread ideas and opinions. A crucial step in understanding the mechanisms that drive information diffusion in Twitter, is to study the influence of the social neighborhood of a user in the construction of her retweeting preferences. In particular, to what extent can the preferences of a user be predicted given the preferences of her neighborhood.We build our own sample graph of Twitter users and study the problem of pre- dicting retweets from a given user based on the retweeting behavior occurring in her second-degree social neighborhood (followed and followed-by-followed). We manage to train and evaluate user-centered binary classification models that predict retweets with an average F 1 score of 87.6%, based purely on social in- formation, that is, without analyzing the content of the tweets.For users getting low scores with such models (on a tuning dataset), we improve the results by adding features extracted from the content of tweets. To do so, we apply a Natural Language Processing (NLP) pipeline including a Twitter-specific adaptation of the Latent Dirichlet Allocation (LDA) probabilistic topic model. Fil: Celayes, Pablo Gabriel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina. Fil: Domínguez, Martín Ariel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina. Otras Ciencias de la Computación e Información |
description |
Ponencia presentada en la 16th Mexican International Conference on Artificial Intelligence. October 23 to 28, Ensenada, Baja California, Mexico. |
publishDate |
2017 |
dc.date.none.fl_str_mv |
2017 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11086/552488 |
url |
http://hdl.handle.net/11086/552488 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess |
eu_rights_str_mv |
openAccess |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:Repositorio Digital Universitario (UNC) instname:Universidad Nacional de Córdoba instacron:UNC |
reponame_str |
Repositorio Digital Universitario (UNC) |
collection |
Repositorio Digital Universitario (UNC) |
instname_str |
Universidad Nacional de Córdoba |
instacron_str |
UNC |
institution |
UNC |
repository.name.fl_str_mv |
Repositorio Digital Universitario (UNC) - Universidad Nacional de Córdoba |
repository.mail.fl_str_mv |
oca.unc@gmail.com |
_version_ |
1846143402730586112 |
score |
12.712165 |