Patterns of Markup use in Wikipedia

Autores
Martin, Jonathan; Torres, Diego; Fernández, Alejandro
Año de publicación
2017
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Wikipedia is a knowledge building community that lets anyone create and edit articles. While editing articles, users employ visual structure elements (VSE) to format content. VSEs are part of the Wikipedia markup language. All creation and editing events are recorded in a revision history. An unsupervised learning approach was used to analyze a dataset with more than 2,000,000 revisions of 126,000 articles. Using K-Means clustering and association rules mining a general classification of revisions was derived. Relevant classes include vandalism revisions, correction revisions and common revisions. Each class was later studied, and patterns of usage of markups elements identified. Those results help to identify the user intention, and the knowledge of VSE use could contribute to improving the actual text editors provide by Wikipedia to improve the editor’s activity finally.
Laboratorio de Investigación y Formación en Informática Avanzada
Materia
Ciencias Informáticas
Pattern mining
Machine learning
Unsupervised learning
Wikipedia
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/155554

id SEDICI_be5612d4ff9e17061e938aadb4fb154e
oai_identifier_str oai:sedici.unlp.edu.ar:10915/155554
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Patterns of Markup use in WikipediaMartin, JonathanTorres, DiegoFernández, AlejandroCiencias InformáticasPattern miningMachine learningUnsupervised learningWikipediaWikipedia is a knowledge building community that lets anyone create and edit articles. While editing articles, users employ visual structure elements (VSE) to format content. VSEs are part of the Wikipedia markup language. All creation and editing events are recorded in a revision history. An unsupervised learning approach was used to analyze a dataset with more than 2,000,000 revisions of 126,000 articles. Using K-Means clustering and association rules mining a general classification of revisions was derived. Relevant classes include vandalism revisions, correction revisions and common revisions. Each class was later studied, and patterns of usage of markups elements identified. Those results help to identify the user intention, and the knowledge of VSE use could contribute to improving the actual text editors provide by Wikipedia to improve the editor’s activity finally.Laboratorio de Investigación y Formación en Informática Avanzada2017-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/155554enginfo:eu-repo/semantics/altIdentifier/isbn/978-1-5386-3483-7info:eu-repo/semantics/altIdentifier/doi/10.1109/SCCC.2017.8405114info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T11:40:22Zoai:sedici.unlp.edu.ar:10915/155554Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 11:40:22.464SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Patterns of Markup use in Wikipedia
title Patterns of Markup use in Wikipedia
spellingShingle Patterns of Markup use in Wikipedia
Martin, Jonathan
Ciencias Informáticas
Pattern mining
Machine learning
Unsupervised learning
Wikipedia
title_short Patterns of Markup use in Wikipedia
title_full Patterns of Markup use in Wikipedia
title_fullStr Patterns of Markup use in Wikipedia
title_full_unstemmed Patterns of Markup use in Wikipedia
title_sort Patterns of Markup use in Wikipedia
dc.creator.none.fl_str_mv Martin, Jonathan
Torres, Diego
Fernández, Alejandro
author Martin, Jonathan
author_facet Martin, Jonathan
Torres, Diego
Fernández, Alejandro
author_role author
author2 Torres, Diego
Fernández, Alejandro
author2_role author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
Pattern mining
Machine learning
Unsupervised learning
Wikipedia
topic Ciencias Informáticas
Pattern mining
Machine learning
Unsupervised learning
Wikipedia
dc.description.none.fl_txt_mv Wikipedia is a knowledge building community that lets anyone create and edit articles. While editing articles, users employ visual structure elements (VSE) to format content. VSEs are part of the Wikipedia markup language. All creation and editing events are recorded in a revision history. An unsupervised learning approach was used to analyze a dataset with more than 2,000,000 revisions of 126,000 articles. Using K-Means clustering and association rules mining a general classification of revisions was derived. Relevant classes include vandalism revisions, correction revisions and common revisions. Each class was later studied, and patterns of usage of markups elements identified. Those results help to identify the user intention, and the knowledge of VSE use could contribute to improving the actual text editors provide by Wikipedia to improve the editor’s activity finally.
Laboratorio de Investigación y Formación en Informática Avanzada
description Wikipedia is a knowledge building community that lets anyone create and edit articles. While editing articles, users employ visual structure elements (VSE) to format content. VSEs are part of the Wikipedia markup language. All creation and editing events are recorded in a revision history. An unsupervised learning approach was used to analyze a dataset with more than 2,000,000 revisions of 126,000 articles. Using K-Means clustering and association rules mining a general classification of revisions was derived. Relevant classes include vandalism revisions, correction revisions and common revisions. Each class was later studied, and patterns of usage of markups elements identified. Those results help to identify the user intention, and the knowledge of VSE use could contribute to improving the actual text editors provide by Wikipedia to improve the editor’s activity finally.
publishDate 2017
dc.date.none.fl_str_mv 2017-10
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/155554
url http://sedici.unlp.edu.ar/handle/10915/155554
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/isbn/978-1-5386-3483-7
info:eu-repo/semantics/altIdentifier/doi/10.1109/SCCC.2017.8405114
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1844616276852342784
score 13.070432