Towards Information Quality Assurance in Spanish: Wikipedia

Autores
Ferretti, Edgardo; Soria, Matías; Pérez Casseignau, Sebastián; Pohn, Lian; Urquiza, Guido; Gómez, Sergio Alejandro; Errecalde, Marcelo
Año de publicación
2017
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Featured Articles (FA) are considered to be the best articles that Wikipedia has to offer and in the last years, researchers have found interesting to analyze whether and how they can be distinguished from “ordinary” articles. Likewise, identifying what issues have to be enhanced or fixed in ordinary articles in order to improve their quality is a recent key research trend. Most of the approaches developed to face these information quality problems have been proposed for the English Wikipedia. However, few efforts have been accomplished in Spanish Wikipedia, despite being Spanish, one of the most spoken languages in the world by native speakers. In this respect, we present a breakdown of Spanish Wikipedia’s quality flaw structure. Besides, we carry out studies with three different corpora to automatically assess information quality in Spanish Wikipedia, where FA identification is evaluated as a binary classification task. Our evaluation on a unified setting allows to compare with the English version, the performance achieved by our approach on the Spanish version. The best results obtained show that FA identification in Spanish, can be performed with an F1 score of 0.88 using a document model consisting of only twenty six features and Support Vector Machine as classification algorithm.
Materia
Ciencias de la Computación e Información
featured article identification
information quality
quality flaws prediction
Wikipedia
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by/4.0/
Repositorio
CIC Digital (CICBA)
Institución
Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
OAI Identificador
oai:digital.cic.gba.gob.ar:11746/5668

id CICBA_db22c32439b9c553ff557b14731dcae1
oai_identifier_str oai:digital.cic.gba.gob.ar:11746/5668
network_acronym_str CICBA
repository_id_str 9441
network_name_str CIC Digital (CICBA)
spelling Towards Information Quality Assurance in Spanish: WikipediaFerretti, EdgardoSoria, MatíasPérez Casseignau, SebastiánPohn, LianUrquiza, GuidoGómez, Sergio AlejandroErrecalde, MarceloCiencias de la Computación e Informaciónfeatured article identificationinformation qualityquality flaws predictionWikipediaFeatured Articles (FA) are considered to be the best articles that Wikipedia has to offer and in the last years, researchers have found interesting to analyze whether and how they can be distinguished from “ordinary” articles. Likewise, identifying what issues have to be enhanced or fixed in ordinary articles in order to improve their quality is a recent key research trend. Most of the approaches developed to face these information quality problems have been proposed for the English Wikipedia. However, few efforts have been accomplished in Spanish Wikipedia, despite being Spanish, one of the most spoken languages in the world by native speakers. In this respect, we present a breakdown of Spanish Wikipedia’s quality flaw structure. Besides, we carry out studies with three different corpora to automatically assess information quality in Spanish Wikipedia, where FA identification is evaluated as a binary classification task. Our evaluation on a unified setting allows to compare with the English version, the performance achieved by our approach on the Spanish version. The best results obtained show that FA identification in Spanish, can be performed with an F1 score of 0.88 using a document model consisting of only twenty six features and Support Vector Machine as classification algorithm.2017-04info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfhttps://digital.cic.gba.gob.ar/handle/11746/5668enginfo:eu-repo/semantics/altIdentifier/issn/1666-6038info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by/4.0/reponame:CIC Digital (CICBA)instname:Comisión de Investigaciones Científicas de la Provincia de Buenos Airesinstacron:CICBA2025-09-29T13:40:20Zoai:digital.cic.gba.gob.ar:11746/5668Institucionalhttp://digital.cic.gba.gob.arOrganismo científico-tecnológicoNo correspondehttp://digital.cic.gba.gob.ar/oai/snrdmarisa.degiusti@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:94412025-09-29 13:40:20.546CIC Digital (CICBA) - Comisión de Investigaciones Científicas de la Provincia de Buenos Airesfalse
dc.title.none.fl_str_mv Towards Information Quality Assurance in Spanish: Wikipedia
title Towards Information Quality Assurance in Spanish: Wikipedia
spellingShingle Towards Information Quality Assurance in Spanish: Wikipedia
Ferretti, Edgardo
Ciencias de la Computación e Información
featured article identification
information quality
quality flaws prediction
Wikipedia
title_short Towards Information Quality Assurance in Spanish: Wikipedia
title_full Towards Information Quality Assurance in Spanish: Wikipedia
title_fullStr Towards Information Quality Assurance in Spanish: Wikipedia
title_full_unstemmed Towards Information Quality Assurance in Spanish: Wikipedia
title_sort Towards Information Quality Assurance in Spanish: Wikipedia
dc.creator.none.fl_str_mv Ferretti, Edgardo
Soria, Matías
Pérez Casseignau, Sebastián
Pohn, Lian
Urquiza, Guido
Gómez, Sergio Alejandro
Errecalde, Marcelo
author Ferretti, Edgardo
author_facet Ferretti, Edgardo
Soria, Matías
Pérez Casseignau, Sebastián
Pohn, Lian
Urquiza, Guido
Gómez, Sergio Alejandro
Errecalde, Marcelo
author_role author
author2 Soria, Matías
Pérez Casseignau, Sebastián
Pohn, Lian
Urquiza, Guido
Gómez, Sergio Alejandro
Errecalde, Marcelo
author2_role author
author
author
author
author
author
dc.subject.none.fl_str_mv Ciencias de la Computación e Información
featured article identification
information quality
quality flaws prediction
Wikipedia
topic Ciencias de la Computación e Información
featured article identification
information quality
quality flaws prediction
Wikipedia
dc.description.none.fl_txt_mv Featured Articles (FA) are considered to be the best articles that Wikipedia has to offer and in the last years, researchers have found interesting to analyze whether and how they can be distinguished from “ordinary” articles. Likewise, identifying what issues have to be enhanced or fixed in ordinary articles in order to improve their quality is a recent key research trend. Most of the approaches developed to face these information quality problems have been proposed for the English Wikipedia. However, few efforts have been accomplished in Spanish Wikipedia, despite being Spanish, one of the most spoken languages in the world by native speakers. In this respect, we present a breakdown of Spanish Wikipedia’s quality flaw structure. Besides, we carry out studies with three different corpora to automatically assess information quality in Spanish Wikipedia, where FA identification is evaluated as a binary classification task. Our evaluation on a unified setting allows to compare with the English version, the performance achieved by our approach on the Spanish version. The best results obtained show that FA identification in Spanish, can be performed with an F1 score of 0.88 using a document model consisting of only twenty six features and Support Vector Machine as classification algorithm.
description Featured Articles (FA) are considered to be the best articles that Wikipedia has to offer and in the last years, researchers have found interesting to analyze whether and how they can be distinguished from “ordinary” articles. Likewise, identifying what issues have to be enhanced or fixed in ordinary articles in order to improve their quality is a recent key research trend. Most of the approaches developed to face these information quality problems have been proposed for the English Wikipedia. However, few efforts have been accomplished in Spanish Wikipedia, despite being Spanish, one of the most spoken languages in the world by native speakers. In this respect, we present a breakdown of Spanish Wikipedia’s quality flaw structure. Besides, we carry out studies with three different corpora to automatically assess information quality in Spanish Wikipedia, where FA identification is evaluated as a binary classification task. Our evaluation on a unified setting allows to compare with the English version, the performance achieved by our approach on the Spanish version. The best results obtained show that FA identification in Spanish, can be performed with an F1 score of 0.88 using a document model consisting of only twenty six features and Support Vector Machine as classification algorithm.
publishDate 2017
dc.date.none.fl_str_mv 2017-04
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv https://digital.cic.gba.gob.ar/handle/11746/5668
url https://digital.cic.gba.gob.ar/handle/11746/5668
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/issn/1666-6038
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by/4.0/
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by/4.0/
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:CIC Digital (CICBA)
instname:Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
instacron:CICBA
reponame_str CIC Digital (CICBA)
collection CIC Digital (CICBA)
instname_str Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
instacron_str CICBA
institution CICBA
repository.name.fl_str_mv CIC Digital (CICBA) - Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
repository.mail.fl_str_mv marisa.degiusti@sedici.unlp.edu.ar
_version_ 1844618617709133824
score 13.070432