Prediction of user retweets based on social neighborhood information and topic modelling

Autores: Celayes, Pablo Gabriel; Domínguez, Martín Ariel
Año de publicación: 2017
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: Ponencia presentada en la 16th Mexican International Conference on Artificial Intelligence. October 23 to 28, Ensenada, Baja California, Mexico.
Fil: Celayes, Pablo Gabriel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.
Fil: Domínguez, Martín Ariel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.
Twitter and other social networks have become a fundamental source of information and a powerful tool to spread ideas and opinions. A crucial step in understanding the mechanisms that drive information diffusion in Twitter, is to study the influence of the social neighborhood of a user in the construction of her retweeting preferences. In particular, to what extent can the preferences of a user be predicted given the preferences of her neighborhood.We build our own sample graph of Twitter users and study the problem of pre- dicting retweets from a given user based on the retweeting behavior occurring in her second-degree social neighborhood (followed and followed-by-followed). We manage to train and evaluate user-centered binary classification models that predict retweets with an average F 1 score of 87.6%, based purely on social in- formation, that is, without analyzing the content of the tweets.For users getting low scores with such models (on a tuning dataset), we improve the results by adding features extracted from the content of tweets. To do so, we apply a Natural Language Processing (NLP) pipeline including a Twitter-specific adaptation of the Latent Dirichlet Allocation (LDA) probabilistic topic model.
Fil: Celayes, Pablo Gabriel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.
Fil: Domínguez, Martín Ariel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.
Otras Ciencias de la Computación e Información
Materia: Machine learning
Social networks
Topic modelling
Natural language processing
Nivel de accesibilidad: acceso abierto
Condiciones de uso
Repositorio
Institución: Universidad Nacional de Córdoba
OAI Identificador: oai:rdu.unc.edu.ar:11086/552488

Acceder

id	RDUUNC_2358afae318f97dc316b59bdb1090fbb
oai_identifier_str	oai:rdu.unc.edu.ar:11086/552488
network_acronym_str	RDUUNC
repository_id_str	2572
network_name_str	Repositorio Digital Universitario (UNC)
spelling	Prediction of user retweets based on social neighborhood information and topic modellingCelayes, Pablo GabrielDomínguez, Martín ArielMachine learningSocial networksTopic modellingNatural language processingPonencia presentada en la 16th Mexican International Conference on Artificial Intelligence. October 23 to 28, Ensenada, Baja California, Mexico.Fil: Celayes, Pablo Gabriel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.Fil: Domínguez, Martín Ariel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.Twitter and other social networks have become a fundamental source of information and a powerful tool to spread ideas and opinions. A crucial step in understanding the mechanisms that drive information diffusion in Twitter, is to study the influence of the social neighborhood of a user in the construction of her retweeting preferences. In particular, to what extent can the preferences of a user be predicted given the preferences of her neighborhood.We build our own sample graph of Twitter users and study the problem of pre- dicting retweets from a given user based on the retweeting behavior occurring in her second-degree social neighborhood (followed and followed-by-followed). We manage to train and evaluate user-centered binary classification models that predict retweets with an average F 1 score of 87.6%, based purely on social in- formation, that is, without analyzing the content of the tweets.For users getting low scores with such models (on a tuning dataset), we improve the results by adding features extracted from the content of tweets. To do so, we apply a Natural Language Processing (NLP) pipeline including a Twitter-specific adaptation of the Latent Dirichlet Allocation (LDA) probabilistic topic model.Fil: Celayes, Pablo Gabriel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.Fil: Domínguez, Martín Ariel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina.Otras Ciencias de la Computación e Información2017info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttp://hdl.handle.net/11086/552488enginfo:eu-repo/semantics/openAccessreponame:Repositorio Digital Universitario (UNC)instname:Universidad Nacional de Córdobainstacron:UNC2026-02-12T12:17:41Zoai:rdu.unc.edu.ar:11086/552488Institucionalhttps://rdu.unc.edu.ar/Universidad públicaNo correspondehttp://rdu.unc.edu.ar/oai/snrdoca.unc@gmail.comArgentinaNo correspondeNo correspondeNo correspondeopendoar:25722026-02-12 12:17:41.621Repositorio Digital Universitario (UNC) - Universidad Nacional de Córdobafalse
dc.title.none.fl_str_mv	Prediction of user retweets based on social neighborhood information and topic modelling
title	Prediction of user retweets based on social neighborhood information and topic modelling
spellingShingle	Prediction of user retweets based on social neighborhood information and topic modelling Celayes, Pablo Gabriel Machine learning Social networks Topic modelling Natural language processing
title_short	Prediction of user retweets based on social neighborhood information and topic modelling
title_full	Prediction of user retweets based on social neighborhood information and topic modelling
title_fullStr	Prediction of user retweets based on social neighborhood information and topic modelling
title_full_unstemmed	Prediction of user retweets based on social neighborhood information and topic modelling
title_sort	Prediction of user retweets based on social neighborhood information and topic modelling
dc.creator.none.fl_str_mv	Celayes, Pablo Gabriel Domínguez, Martín Ariel
author	Celayes, Pablo Gabriel
author_facet	Celayes, Pablo Gabriel Domínguez, Martín Ariel
author_role	author
author2	Domínguez, Martín Ariel
author2_role	author
dc.subject.none.fl_str_mv	Machine learning Social networks Topic modelling Natural language processing
topic	Machine learning Social networks Topic modelling Natural language processing
dc.description.none.fl_txt_mv	Ponencia presentada en la 16th Mexican International Conference on Artificial Intelligence. October 23 to 28, Ensenada, Baja California, Mexico. Fil: Celayes, Pablo Gabriel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina. Fil: Domínguez, Martín Ariel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina. Twitter and other social networks have become a fundamental source of information and a powerful tool to spread ideas and opinions. A crucial step in understanding the mechanisms that drive information diffusion in Twitter, is to study the influence of the social neighborhood of a user in the construction of her retweeting preferences. In particular, to what extent can the preferences of a user be predicted given the preferences of her neighborhood.We build our own sample graph of Twitter users and study the problem of pre- dicting retweets from a given user based on the retweeting behavior occurring in her second-degree social neighborhood (followed and followed-by-followed). We manage to train and evaluate user-centered binary classification models that predict retweets with an average F 1 score of 87.6%, based purely on social in- formation, that is, without analyzing the content of the tweets.For users getting low scores with such models (on a tuning dataset), we improve the results by adding features extracted from the content of tweets. To do so, we apply a Natural Language Processing (NLP) pipeline including a Twitter-specific adaptation of the Latent Dirichlet Allocation (LDA) probabilistic topic model. Fil: Celayes, Pablo Gabriel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina. Fil: Domínguez, Martín Ariel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía, Física y Computación; Argentina. Otras Ciencias de la Computación e Información
description	Ponencia presentada en la 16th Mexican International Conference on Artificial Intelligence. October 23 to 28, Ensenada, Baja California, Mexico.
publishDate	2017
dc.date.none.fl_str_mv	2017
dc.type.none.fl_str_mv	info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
format	conferenceObject
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://hdl.handle.net/11086/552488
url	http://hdl.handle.net/11086/552488
dc.language.none.fl_str_mv	eng
language	eng
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess
eu_rights_str_mv	openAccess
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:Repositorio Digital Universitario (UNC) instname:Universidad Nacional de Córdoba instacron:UNC
reponame_str	Repositorio Digital Universitario (UNC)
collection	Repositorio Digital Universitario (UNC)
instname_str	Universidad Nacional de Córdoba
instacron_str	UNC
institution	UNC
repository.name.fl_str_mv	Repositorio Digital Universitario (UNC) - Universidad Nacional de Córdoba
repository.mail.fl_str_mv	oca.unc@gmail.com
_version_	1856934980985290752
score	12.930639

Prediction of user retweets based on social neighborhood information and topic modelling

Publicaciones similares