On the Assessment of Information Quality in Spanish Wikipedia
- Autores
- Urquiza, Guido; Soria, Matías; Pérez Casseignau, Sebastián; Ferretti, Edgardo; Gómez, Sergio Alejandro; Errecalde, Marcelo Luis
- Año de publicación
- 2016
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- Featured Articles (FA) are considered to be the best articles that Wikipedia has to offer and in the last years, researchers have found interesting to analyze whether and how they can be distinguished from “ordinary” articles. Likewise, identifying what issues have to be enhanced or fixed in ordinary articles in order to improve their quality is a recent key research trend. Most of the approaches developed in these research trends have been proposed for the English Wikipedia. However, few efforts have been accomplished in Spanish Wikipedia, despite being Spanish, one of the most spoken languages in the world by native speakers. In this respect, we present a first breakdown of Spanish Wikipedia’s quality flaw structure. Besides, we carry out a study to automatically assess information quality in Spanish Wikipedia, where FA identification is evaluated as a binary classification task. The results obtained show that FA identification can be performed with an F1 score of 0.81, using a document model consisting of only twenty six features and AdaBoosted C4.5 decision trees as classification algorithm.
XIII Workshop Bases de datos y Minería de Datos (WBDMD)
Red de Universidades con Carreras en Informática (RedUNCI) - Materia
-
Ciencias Informáticas
Featured Articles (FA)
Wikipedia
Quality Flaws Prediction - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by/4.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/56750
Ver los metadatos del registro completo
id |
SEDICI_27dc294d6ac6ff21da7803d09989f5fc |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/56750 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
On the Assessment of Information Quality in Spanish WikipediaUrquiza, GuidoSoria, MatíasPérez Casseignau, SebastiánFerretti, EdgardoGómez, Sergio AlejandroErrecalde, Marcelo LuisCiencias InformáticasFeatured Articles (FA)WikipediaQuality Flaws PredictionFeatured Articles (FA) are considered to be the best articles that Wikipedia has to offer and in the last years, researchers have found interesting to analyze whether and how they can be distinguished from “ordinary” articles. Likewise, identifying what issues have to be enhanced or fixed in ordinary articles in order to improve their quality is a recent key research trend. Most of the approaches developed in these research trends have been proposed for the English Wikipedia. However, few efforts have been accomplished in Spanish Wikipedia, despite being Spanish, one of the most spoken languages in the world by native speakers. In this respect, we present a first breakdown of Spanish Wikipedia’s quality flaw structure. Besides, we carry out a study to automatically assess information quality in Spanish Wikipedia, where FA identification is evaluated as a binary classification task. The results obtained show that FA identification can be performed with an F1 score of 0.81, using a document model consisting of only twenty six features and AdaBoosted C4.5 decision trees as classification algorithm.XIII Workshop Bases de datos y Minería de Datos (WBDMD)Red de Universidades con Carreras en Informática (RedUNCI)2016-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf702-711http://sedici.unlp.edu.ar/handle/10915/56750enginfo:eu-repo/semantics/reference/hdl/10915/55718info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by/4.0/Creative Commons Attribution 4.0 International (CC BY 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T11:06:08Zoai:sedici.unlp.edu.ar:10915/56750Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 11:06:08.382SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
On the Assessment of Information Quality in Spanish Wikipedia |
title |
On the Assessment of Information Quality in Spanish Wikipedia |
spellingShingle |
On the Assessment of Information Quality in Spanish Wikipedia Urquiza, Guido Ciencias Informáticas Featured Articles (FA) Wikipedia Quality Flaws Prediction |
title_short |
On the Assessment of Information Quality in Spanish Wikipedia |
title_full |
On the Assessment of Information Quality in Spanish Wikipedia |
title_fullStr |
On the Assessment of Information Quality in Spanish Wikipedia |
title_full_unstemmed |
On the Assessment of Information Quality in Spanish Wikipedia |
title_sort |
On the Assessment of Information Quality in Spanish Wikipedia |
dc.creator.none.fl_str_mv |
Urquiza, Guido Soria, Matías Pérez Casseignau, Sebastián Ferretti, Edgardo Gómez, Sergio Alejandro Errecalde, Marcelo Luis |
author |
Urquiza, Guido |
author_facet |
Urquiza, Guido Soria, Matías Pérez Casseignau, Sebastián Ferretti, Edgardo Gómez, Sergio Alejandro Errecalde, Marcelo Luis |
author_role |
author |
author2 |
Soria, Matías Pérez Casseignau, Sebastián Ferretti, Edgardo Gómez, Sergio Alejandro Errecalde, Marcelo Luis |
author2_role |
author author author author author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas Featured Articles (FA) Wikipedia Quality Flaws Prediction |
topic |
Ciencias Informáticas Featured Articles (FA) Wikipedia Quality Flaws Prediction |
dc.description.none.fl_txt_mv |
Featured Articles (FA) are considered to be the best articles that Wikipedia has to offer and in the last years, researchers have found interesting to analyze whether and how they can be distinguished from “ordinary” articles. Likewise, identifying what issues have to be enhanced or fixed in ordinary articles in order to improve their quality is a recent key research trend. Most of the approaches developed in these research trends have been proposed for the English Wikipedia. However, few efforts have been accomplished in Spanish Wikipedia, despite being Spanish, one of the most spoken languages in the world by native speakers. In this respect, we present a first breakdown of Spanish Wikipedia’s quality flaw structure. Besides, we carry out a study to automatically assess information quality in Spanish Wikipedia, where FA identification is evaluated as a binary classification task. The results obtained show that FA identification can be performed with an F1 score of 0.81, using a document model consisting of only twenty six features and AdaBoosted C4.5 decision trees as classification algorithm. XIII Workshop Bases de datos y Minería de Datos (WBDMD) Red de Universidades con Carreras en Informática (RedUNCI) |
description |
Featured Articles (FA) are considered to be the best articles that Wikipedia has to offer and in the last years, researchers have found interesting to analyze whether and how they can be distinguished from “ordinary” articles. Likewise, identifying what issues have to be enhanced or fixed in ordinary articles in order to improve their quality is a recent key research trend. Most of the approaches developed in these research trends have been proposed for the English Wikipedia. However, few efforts have been accomplished in Spanish Wikipedia, despite being Spanish, one of the most spoken languages in the world by native speakers. In this respect, we present a first breakdown of Spanish Wikipedia’s quality flaw structure. Besides, we carry out a study to automatically assess information quality in Spanish Wikipedia, where FA identification is evaluated as a binary classification task. The results obtained show that FA identification can be performed with an F1 score of 0.81, using a document model consisting of only twenty six features and AdaBoosted C4.5 decision trees as classification algorithm. |
publishDate |
2016 |
dc.date.none.fl_str_mv |
2016-10 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/56750 |
url |
http://sedici.unlp.edu.ar/handle/10915/56750 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/reference/hdl/10915/55718 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/4.0/ Creative Commons Attribution 4.0 International (CC BY 4.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by/4.0/ Creative Commons Attribution 4.0 International (CC BY 4.0) |
dc.format.none.fl_str_mv |
application/pdf 702-711 |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1844615932191703040 |
score |
13.070432 |