Patterns of Markup use in Wikipedia
- Autores
- Martin, Jonathan; Torres, Diego; Fernández, Alejandro
- Año de publicación
- 2017
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- Wikipedia is a knowledge building community that lets anyone create and edit articles. While editing articles, users employ visual structure elements (VSE) to format content. VSEs are part of the Wikipedia markup language. All creation and editing events are recorded in a revision history. An unsupervised learning approach was used to analyze a dataset with more than 2,000,000 revisions of 126,000 articles. Using K-Means clustering and association rules mining a general classification of revisions was derived. Relevant classes include vandalism revisions, correction revisions and common revisions. Each class was later studied, and patterns of usage of markups elements identified. Those results help to identify the user intention, and the knowledge of VSE use could contribute to improving the actual text editors provide by Wikipedia to improve the editor’s activity finally.
Laboratorio de Investigación y Formación en Informática Avanzada - Materia
-
Ciencias Informáticas
Pattern mining
Machine learning
Unsupervised learning
Wikipedia - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-nc-sa/4.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/155554
Ver los metadatos del registro completo
id |
SEDICI_be5612d4ff9e17061e938aadb4fb154e |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/155554 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
Patterns of Markup use in WikipediaMartin, JonathanTorres, DiegoFernández, AlejandroCiencias InformáticasPattern miningMachine learningUnsupervised learningWikipediaWikipedia is a knowledge building community that lets anyone create and edit articles. While editing articles, users employ visual structure elements (VSE) to format content. VSEs are part of the Wikipedia markup language. All creation and editing events are recorded in a revision history. An unsupervised learning approach was used to analyze a dataset with more than 2,000,000 revisions of 126,000 articles. Using K-Means clustering and association rules mining a general classification of revisions was derived. Relevant classes include vandalism revisions, correction revisions and common revisions. Each class was later studied, and patterns of usage of markups elements identified. Those results help to identify the user intention, and the knowledge of VSE use could contribute to improving the actual text editors provide by Wikipedia to improve the editor’s activity finally.Laboratorio de Investigación y Formación en Informática Avanzada2017-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/155554enginfo:eu-repo/semantics/altIdentifier/isbn/978-1-5386-3483-7info:eu-repo/semantics/altIdentifier/doi/10.1109/SCCC.2017.8405114info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T11:40:22Zoai:sedici.unlp.edu.ar:10915/155554Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 11:40:22.464SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
Patterns of Markup use in Wikipedia |
title |
Patterns of Markup use in Wikipedia |
spellingShingle |
Patterns of Markup use in Wikipedia Martin, Jonathan Ciencias Informáticas Pattern mining Machine learning Unsupervised learning Wikipedia |
title_short |
Patterns of Markup use in Wikipedia |
title_full |
Patterns of Markup use in Wikipedia |
title_fullStr |
Patterns of Markup use in Wikipedia |
title_full_unstemmed |
Patterns of Markup use in Wikipedia |
title_sort |
Patterns of Markup use in Wikipedia |
dc.creator.none.fl_str_mv |
Martin, Jonathan Torres, Diego Fernández, Alejandro |
author |
Martin, Jonathan |
author_facet |
Martin, Jonathan Torres, Diego Fernández, Alejandro |
author_role |
author |
author2 |
Torres, Diego Fernández, Alejandro |
author2_role |
author author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas Pattern mining Machine learning Unsupervised learning Wikipedia |
topic |
Ciencias Informáticas Pattern mining Machine learning Unsupervised learning Wikipedia |
dc.description.none.fl_txt_mv |
Wikipedia is a knowledge building community that lets anyone create and edit articles. While editing articles, users employ visual structure elements (VSE) to format content. VSEs are part of the Wikipedia markup language. All creation and editing events are recorded in a revision history. An unsupervised learning approach was used to analyze a dataset with more than 2,000,000 revisions of 126,000 articles. Using K-Means clustering and association rules mining a general classification of revisions was derived. Relevant classes include vandalism revisions, correction revisions and common revisions. Each class was later studied, and patterns of usage of markups elements identified. Those results help to identify the user intention, and the knowledge of VSE use could contribute to improving the actual text editors provide by Wikipedia to improve the editor’s activity finally. Laboratorio de Investigación y Formación en Informática Avanzada |
description |
Wikipedia is a knowledge building community that lets anyone create and edit articles. While editing articles, users employ visual structure elements (VSE) to format content. VSEs are part of the Wikipedia markup language. All creation and editing events are recorded in a revision history. An unsupervised learning approach was used to analyze a dataset with more than 2,000,000 revisions of 126,000 articles. Using K-Means clustering and association rules mining a general classification of revisions was derived. Relevant classes include vandalism revisions, correction revisions and common revisions. Each class was later studied, and patterns of usage of markups elements identified. Those results help to identify the user intention, and the knowledge of VSE use could contribute to improving the actual text editors provide by Wikipedia to improve the editor’s activity finally. |
publishDate |
2017 |
dc.date.none.fl_str_mv |
2017-10 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/155554 |
url |
http://sedici.unlp.edu.ar/handle/10915/155554 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/isbn/978-1-5386-3483-7 info:eu-repo/semantics/altIdentifier/doi/10.1109/SCCC.2017.8405114 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1844616276852342784 |
score |
13.070432 |