On the Assessment of Information Quality in Spanish Wikipedia

Autores
Urquiza, Guido; Soria, Matías; Pérez Casseignau, Sebastián; Ferretti, Edgardo; Gómez, Sergio Alejandro; Errecalde, Marcelo Luis
Año de publicación
2016
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Featured Articles (FA) are considered to be the best articles that Wikipedia has to offer and in the last years, researchers have found interesting to analyze whether and how they can be distinguished from “ordinary” articles. Likewise, identifying what issues have to be enhanced or fixed in ordinary articles in order to improve their quality is a recent key research trend. Most of the approaches developed in these research trends have been proposed for the English Wikipedia. However, few efforts have been accomplished in Spanish Wikipedia, despite being Spanish, one of the most spoken languages in the world by native speakers. In this respect, we present a first breakdown of Spanish Wikipedia’s quality flaw structure. Besides, we carry out a study to automatically assess information quality in Spanish Wikipedia, where FA identification is evaluated as a binary classification task. The results obtained show that FA identification can be performed with an F1 score of 0.81, using a document model consisting of only twenty six features and AdaBoosted C4.5 decision trees as classification algorithm.
XIII Workshop Bases de datos y Minería de Datos (WBDMD)
Red de Universidades con Carreras en Informática (RedUNCI)
Materia
Ciencias Informáticas
Featured Articles (FA)
Wikipedia
Quality Flaws Prediction
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/56750

id SEDICI_27dc294d6ac6ff21da7803d09989f5fc
oai_identifier_str oai:sedici.unlp.edu.ar:10915/56750
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling On the Assessment of Information Quality in Spanish WikipediaUrquiza, GuidoSoria, MatíasPérez Casseignau, SebastiánFerretti, EdgardoGómez, Sergio AlejandroErrecalde, Marcelo LuisCiencias InformáticasFeatured Articles (FA)WikipediaQuality Flaws PredictionFeatured Articles (FA) are considered to be the best articles that Wikipedia has to offer and in the last years, researchers have found interesting to analyze whether and how they can be distinguished from “ordinary” articles. Likewise, identifying what issues have to be enhanced or fixed in ordinary articles in order to improve their quality is a recent key research trend. Most of the approaches developed in these research trends have been proposed for the English Wikipedia. However, few efforts have been accomplished in Spanish Wikipedia, despite being Spanish, one of the most spoken languages in the world by native speakers. In this respect, we present a first breakdown of Spanish Wikipedia’s quality flaw structure. Besides, we carry out a study to automatically assess information quality in Spanish Wikipedia, where FA identification is evaluated as a binary classification task. The results obtained show that FA identification can be performed with an F1 score of 0.81, using a document model consisting of only twenty six features and AdaBoosted C4.5 decision trees as classification algorithm.XIII Workshop Bases de datos y Minería de Datos (WBDMD)Red de Universidades con Carreras en Informática (RedUNCI)2016-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf702-711http://sedici.unlp.edu.ar/handle/10915/56750enginfo:eu-repo/semantics/reference/hdl/10915/55718info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by/4.0/Creative Commons Attribution 4.0 International (CC BY 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T11:06:08Zoai:sedici.unlp.edu.ar:10915/56750Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 11:06:08.382SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv On the Assessment of Information Quality in Spanish Wikipedia
title On the Assessment of Information Quality in Spanish Wikipedia
spellingShingle On the Assessment of Information Quality in Spanish Wikipedia
Urquiza, Guido
Ciencias Informáticas
Featured Articles (FA)
Wikipedia
Quality Flaws Prediction
title_short On the Assessment of Information Quality in Spanish Wikipedia
title_full On the Assessment of Information Quality in Spanish Wikipedia
title_fullStr On the Assessment of Information Quality in Spanish Wikipedia
title_full_unstemmed On the Assessment of Information Quality in Spanish Wikipedia
title_sort On the Assessment of Information Quality in Spanish Wikipedia
dc.creator.none.fl_str_mv Urquiza, Guido
Soria, Matías
Pérez Casseignau, Sebastián
Ferretti, Edgardo
Gómez, Sergio Alejandro
Errecalde, Marcelo Luis
author Urquiza, Guido
author_facet Urquiza, Guido
Soria, Matías
Pérez Casseignau, Sebastián
Ferretti, Edgardo
Gómez, Sergio Alejandro
Errecalde, Marcelo Luis
author_role author
author2 Soria, Matías
Pérez Casseignau, Sebastián
Ferretti, Edgardo
Gómez, Sergio Alejandro
Errecalde, Marcelo Luis
author2_role author
author
author
author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
Featured Articles (FA)
Wikipedia
Quality Flaws Prediction
topic Ciencias Informáticas
Featured Articles (FA)
Wikipedia
Quality Flaws Prediction
dc.description.none.fl_txt_mv Featured Articles (FA) are considered to be the best articles that Wikipedia has to offer and in the last years, researchers have found interesting to analyze whether and how they can be distinguished from “ordinary” articles. Likewise, identifying what issues have to be enhanced or fixed in ordinary articles in order to improve their quality is a recent key research trend. Most of the approaches developed in these research trends have been proposed for the English Wikipedia. However, few efforts have been accomplished in Spanish Wikipedia, despite being Spanish, one of the most spoken languages in the world by native speakers. In this respect, we present a first breakdown of Spanish Wikipedia’s quality flaw structure. Besides, we carry out a study to automatically assess information quality in Spanish Wikipedia, where FA identification is evaluated as a binary classification task. The results obtained show that FA identification can be performed with an F1 score of 0.81, using a document model consisting of only twenty six features and AdaBoosted C4.5 decision trees as classification algorithm.
XIII Workshop Bases de datos y Minería de Datos (WBDMD)
Red de Universidades con Carreras en Informática (RedUNCI)
description Featured Articles (FA) are considered to be the best articles that Wikipedia has to offer and in the last years, researchers have found interesting to analyze whether and how they can be distinguished from “ordinary” articles. Likewise, identifying what issues have to be enhanced or fixed in ordinary articles in order to improve their quality is a recent key research trend. Most of the approaches developed in these research trends have been proposed for the English Wikipedia. However, few efforts have been accomplished in Spanish Wikipedia, despite being Spanish, one of the most spoken languages in the world by native speakers. In this respect, we present a first breakdown of Spanish Wikipedia’s quality flaw structure. Besides, we carry out a study to automatically assess information quality in Spanish Wikipedia, where FA identification is evaluated as a binary classification task. The results obtained show that FA identification can be performed with an F1 score of 0.81, using a document model consisting of only twenty six features and AdaBoosted C4.5 decision trees as classification algorithm.
publishDate 2016
dc.date.none.fl_str_mv 2016-10
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/56750
url http://sedici.unlp.edu.ar/handle/10915/56750
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/reference/hdl/10915/55718
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by/4.0/
Creative Commons Attribution 4.0 International (CC BY 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by/4.0/
Creative Commons Attribution 4.0 International (CC BY 4.0)
dc.format.none.fl_str_mv application/pdf
702-711
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1844615932191703040
score 13.070432