Towards Information Quality Assurance in Spanish: Wikipedia
- Autores
- Ferretti, Edgardo; Soria, Matías; Pérez Casseignau, Sebastián; Pohn, Lian; Urquiza, Guido; Gómez, Sergio Alejandro; Errecalde, Marcelo Luis
- Año de publicación
- 2017
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Featured Articles (FA) are considered to be the best articles that Wikipedia has to offer and in the last years, researchers have found interesting to analyze whether and how they can be distinguished from “ordinary” articles. Likewise, identifying what issues have to be enhanced or fixed in ordinary articles in order to improve their quality is a recent key research trend. Most of the approaches developed to face these information quality problems have been proposed for the English Wikipedia. However, few efforts have been accomplished in Spanish Wikipedia, despite being Spanish, one of the most spoken languages in the world by native speakers. In this respect, we present a breakdown of Spanish Wikipedia’s quality flaw structure. Besides, we carry out studies with three different corpora to automatically assess information quality in Spanish Wikipedia, where FA identification is evaluated as a binary classification task. Our evaluation on a unified setting allows to compare with the English version, the performance achieved by our approach on the Spanish version. The best results obtained show that FA identification in Spanish, can be performed with an F1 score of 0.88 using a document model consisting of only twenty six features and Support Vector Machine as classification algorithm.
Facultad de Informática - Materia
-
Ciencias Informáticas
featured article identification
information quality
quality flaws prediction
Wikipedia - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by/3.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/59979
Ver los metadatos del registro completo
id |
SEDICI_2468c0a2062018fa183ae6c16cdaa772 |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/59979 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
Towards Information Quality Assurance in Spanish: WikipediaFerretti, EdgardoSoria, MatíasPérez Casseignau, SebastiánPohn, LianUrquiza, GuidoGómez, Sergio AlejandroErrecalde, Marcelo LuisCiencias Informáticasfeatured article identificationinformation qualityquality flaws predictionWikipediaFeatured Articles (FA) are considered to be the best articles that Wikipedia has to offer and in the last years, researchers have found interesting to analyze whether and how they can be distinguished from “ordinary” articles. Likewise, identifying what issues have to be enhanced or fixed in ordinary articles in order to improve their quality is a recent key research trend. Most of the approaches developed to face these information quality problems have been proposed for the English Wikipedia. However, few efforts have been accomplished in Spanish Wikipedia, despite being Spanish, one of the most spoken languages in the world by native speakers. In this respect, we present a breakdown of Spanish Wikipedia’s quality flaw structure. Besides, we carry out studies with three different corpora to automatically assess information quality in Spanish Wikipedia, where FA identification is evaluated as a binary classification task. Our evaluation on a unified setting allows to compare with the English version, the performance achieved by our approach on the Spanish version. The best results obtained show that FA identification in Spanish, can be performed with an F1 score of 0.88 using a document model consisting of only twenty six features and Support Vector Machine as classification algorithm.Facultad de Informática2017-04info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionArticulohttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdf29-36http://sedici.unlp.edu.ar/handle/10915/59979enginfo:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/wp-content/uploads/2017/05/JCST-44-Paper-4.pdfinfo:eu-repo/semantics/altIdentifier/issn/1666-6038info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by/3.0/Creative Commons Attribution 3.0 Unported (CC BY 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T11:07:15Zoai:sedici.unlp.edu.ar:10915/59979Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 11:07:15.856SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
Towards Information Quality Assurance in Spanish: Wikipedia |
title |
Towards Information Quality Assurance in Spanish: Wikipedia |
spellingShingle |
Towards Information Quality Assurance in Spanish: Wikipedia Ferretti, Edgardo Ciencias Informáticas featured article identification information quality quality flaws prediction Wikipedia |
title_short |
Towards Information Quality Assurance in Spanish: Wikipedia |
title_full |
Towards Information Quality Assurance in Spanish: Wikipedia |
title_fullStr |
Towards Information Quality Assurance in Spanish: Wikipedia |
title_full_unstemmed |
Towards Information Quality Assurance in Spanish: Wikipedia |
title_sort |
Towards Information Quality Assurance in Spanish: Wikipedia |
dc.creator.none.fl_str_mv |
Ferretti, Edgardo Soria, Matías Pérez Casseignau, Sebastián Pohn, Lian Urquiza, Guido Gómez, Sergio Alejandro Errecalde, Marcelo Luis |
author |
Ferretti, Edgardo |
author_facet |
Ferretti, Edgardo Soria, Matías Pérez Casseignau, Sebastián Pohn, Lian Urquiza, Guido Gómez, Sergio Alejandro Errecalde, Marcelo Luis |
author_role |
author |
author2 |
Soria, Matías Pérez Casseignau, Sebastián Pohn, Lian Urquiza, Guido Gómez, Sergio Alejandro Errecalde, Marcelo Luis |
author2_role |
author author author author author author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas featured article identification information quality quality flaws prediction Wikipedia |
topic |
Ciencias Informáticas featured article identification information quality quality flaws prediction Wikipedia |
dc.description.none.fl_txt_mv |
Featured Articles (FA) are considered to be the best articles that Wikipedia has to offer and in the last years, researchers have found interesting to analyze whether and how they can be distinguished from “ordinary” articles. Likewise, identifying what issues have to be enhanced or fixed in ordinary articles in order to improve their quality is a recent key research trend. Most of the approaches developed to face these information quality problems have been proposed for the English Wikipedia. However, few efforts have been accomplished in Spanish Wikipedia, despite being Spanish, one of the most spoken languages in the world by native speakers. In this respect, we present a breakdown of Spanish Wikipedia’s quality flaw structure. Besides, we carry out studies with three different corpora to automatically assess information quality in Spanish Wikipedia, where FA identification is evaluated as a binary classification task. Our evaluation on a unified setting allows to compare with the English version, the performance achieved by our approach on the Spanish version. The best results obtained show that FA identification in Spanish, can be performed with an F1 score of 0.88 using a document model consisting of only twenty six features and Support Vector Machine as classification algorithm. Facultad de Informática |
description |
Featured Articles (FA) are considered to be the best articles that Wikipedia has to offer and in the last years, researchers have found interesting to analyze whether and how they can be distinguished from “ordinary” articles. Likewise, identifying what issues have to be enhanced or fixed in ordinary articles in order to improve their quality is a recent key research trend. Most of the approaches developed to face these information quality problems have been proposed for the English Wikipedia. However, few efforts have been accomplished in Spanish Wikipedia, despite being Spanish, one of the most spoken languages in the world by native speakers. In this respect, we present a breakdown of Spanish Wikipedia’s quality flaw structure. Besides, we carry out studies with three different corpora to automatically assess information quality in Spanish Wikipedia, where FA identification is evaluated as a binary classification task. Our evaluation on a unified setting allows to compare with the English version, the performance achieved by our approach on the Spanish version. The best results obtained show that FA identification in Spanish, can be performed with an F1 score of 0.88 using a document model consisting of only twenty six features and Support Vector Machine as classification algorithm. |
publishDate |
2017 |
dc.date.none.fl_str_mv |
2017-04 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion Articulo http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/59979 |
url |
http://sedici.unlp.edu.ar/handle/10915/59979 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/wp-content/uploads/2017/05/JCST-44-Paper-4.pdf info:eu-repo/semantics/altIdentifier/issn/1666-6038 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/3.0/ Creative Commons Attribution 3.0 Unported (CC BY 3.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by/3.0/ Creative Commons Attribution 3.0 Unported (CC BY 3.0) |
dc.format.none.fl_str_mv |
application/pdf 29-36 |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1844615943865499648 |
score |
13.070432 |