Language-Agnostic Modeling of Source Reliability on Wikipedia

Autores
D'Ignazi, Jacopo; Kaltenbrunner, Andreas; Mejova, Yelena; Tizzani, Michele; Kalimeri, Kyriaki; Beiro, Mariano Gastón; Aragón, Pablo
Año de publicación
2025
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Over the last few years, verifying the credibility of information sources has become a fundamental need to combat disinformation.Here, we present a language-agnostic model designed to assess the reliability of web domains as sources in references across multiple language editions of Wikipedia. Utilizing editing activity data, the model evaluates domain reliability within different articles of varying controversiality such as Climate Change, COVID-19, History, Media, and Biology topics.Crafting features that express domain usage across articles, the model effectively predicts domain reliability, achieving an F1 Macro score of approximately 0.80 for English and other high-resource languages. For mid-resource languages, we achieve 0.65 while the performance of low-resource languages varies. In all cases, the time the domain remains present in the articles (which we dub as permanence) is one of the most predictive features.We highlight the challenge of maintaining consistent model performance across languages of varying resource levels and demonstrate that adapting models from higher-resource languages can improve performance.
Fil: D'Ignazi, Jacopo. Universitat Pompeu Fabra; España. ISI Foundation; Italia
Fil: Kaltenbrunner, Andreas. Universitat Oberta de Catalunya; España. ISI Foundation; Italia
Fil: Mejova, Yelena. ISI Foundation; Italia
Fil: Tizzani, Michele. ISI Foundation; Italia
Fil: Kalimeri, Kyriaki. ISI Foundation; Italia
Fil: Beiro, Mariano Gastón. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad de San Andrés. Departamento de Ingeniería;
Fil: Aragón, Pablo. Universitat Pompeu Fabra; España
Materia
Information Systems
Wikipedia
Machine Learning
Disinformation
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/276962

id CONICETDig_174e836f6ed4d687b8def52ddf75c1ab
oai_identifier_str oai:ri.conicet.gov.ar:11336/276962
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Language-Agnostic Modeling of Source Reliability on WikipediaD'Ignazi, JacopoKaltenbrunner, AndreasMejova, YelenaTizzani, MicheleKalimeri, KyriakiBeiro, Mariano GastónAragón, PabloInformation SystemsWikipediaMachine LearningDisinformationhttps://purl.org/becyt/ford/2.2https://purl.org/becyt/ford/2Over the last few years, verifying the credibility of information sources has become a fundamental need to combat disinformation.Here, we present a language-agnostic model designed to assess the reliability of web domains as sources in references across multiple language editions of Wikipedia. Utilizing editing activity data, the model evaluates domain reliability within different articles of varying controversiality such as Climate Change, COVID-19, History, Media, and Biology topics.Crafting features that express domain usage across articles, the model effectively predicts domain reliability, achieving an F1 Macro score of approximately 0.80 for English and other high-resource languages. For mid-resource languages, we achieve 0.65 while the performance of low-resource languages varies. In all cases, the time the domain remains present in the articles (which we dub as permanence) is one of the most predictive features.We highlight the challenge of maintaining consistent model performance across languages of varying resource levels and demonstrate that adapting models from higher-resource languages can improve performance.Fil: D'Ignazi, Jacopo. Universitat Pompeu Fabra; España. ISI Foundation; ItaliaFil: Kaltenbrunner, Andreas. Universitat Oberta de Catalunya; España. ISI Foundation; ItaliaFil: Mejova, Yelena. ISI Foundation; ItaliaFil: Tizzani, Michele. ISI Foundation; ItaliaFil: Kalimeri, Kyriaki. ISI Foundation; ItaliaFil: Beiro, Mariano Gastón. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad de San Andrés. Departamento de Ingeniería;Fil: Aragón, Pablo. Universitat Pompeu Fabra; EspañaAssociation for Computing Machinery2025-11info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/276962D'Ignazi, Jacopo; Kaltenbrunner, Andreas; Mejova, Yelena; Tizzani, Michele; Kalimeri, Kyriaki; et al.; Language-Agnostic Modeling of Source Reliability on Wikipedia; Association for Computing Machinery; Acm Transactions On The Web; 2025; 11-2025; 1-301559-1131CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://dl.acm.org/doi/10.1145/3777444info:eu-repo/semantics/altIdentifier/doi/10.1145/3777444info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-12-23T13:25:59Zoai:ri.conicet.gov.ar:11336/276962instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-12-23 13:26:00.008CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Language-Agnostic Modeling of Source Reliability on Wikipedia
title Language-Agnostic Modeling of Source Reliability on Wikipedia
spellingShingle Language-Agnostic Modeling of Source Reliability on Wikipedia
D'Ignazi, Jacopo
Information Systems
Wikipedia
Machine Learning
Disinformation
title_short Language-Agnostic Modeling of Source Reliability on Wikipedia
title_full Language-Agnostic Modeling of Source Reliability on Wikipedia
title_fullStr Language-Agnostic Modeling of Source Reliability on Wikipedia
title_full_unstemmed Language-Agnostic Modeling of Source Reliability on Wikipedia
title_sort Language-Agnostic Modeling of Source Reliability on Wikipedia
dc.creator.none.fl_str_mv D'Ignazi, Jacopo
Kaltenbrunner, Andreas
Mejova, Yelena
Tizzani, Michele
Kalimeri, Kyriaki
Beiro, Mariano Gastón
Aragón, Pablo
author D'Ignazi, Jacopo
author_facet D'Ignazi, Jacopo
Kaltenbrunner, Andreas
Mejova, Yelena
Tizzani, Michele
Kalimeri, Kyriaki
Beiro, Mariano Gastón
Aragón, Pablo
author_role author
author2 Kaltenbrunner, Andreas
Mejova, Yelena
Tizzani, Michele
Kalimeri, Kyriaki
Beiro, Mariano Gastón
Aragón, Pablo
author2_role author
author
author
author
author
author
dc.subject.none.fl_str_mv Information Systems
Wikipedia
Machine Learning
Disinformation
topic Information Systems
Wikipedia
Machine Learning
Disinformation
purl_subject.fl_str_mv https://purl.org/becyt/ford/2.2
https://purl.org/becyt/ford/2
dc.description.none.fl_txt_mv Over the last few years, verifying the credibility of information sources has become a fundamental need to combat disinformation.Here, we present a language-agnostic model designed to assess the reliability of web domains as sources in references across multiple language editions of Wikipedia. Utilizing editing activity data, the model evaluates domain reliability within different articles of varying controversiality such as Climate Change, COVID-19, History, Media, and Biology topics.Crafting features that express domain usage across articles, the model effectively predicts domain reliability, achieving an F1 Macro score of approximately 0.80 for English and other high-resource languages. For mid-resource languages, we achieve 0.65 while the performance of low-resource languages varies. In all cases, the time the domain remains present in the articles (which we dub as permanence) is one of the most predictive features.We highlight the challenge of maintaining consistent model performance across languages of varying resource levels and demonstrate that adapting models from higher-resource languages can improve performance.
Fil: D'Ignazi, Jacopo. Universitat Pompeu Fabra; España. ISI Foundation; Italia
Fil: Kaltenbrunner, Andreas. Universitat Oberta de Catalunya; España. ISI Foundation; Italia
Fil: Mejova, Yelena. ISI Foundation; Italia
Fil: Tizzani, Michele. ISI Foundation; Italia
Fil: Kalimeri, Kyriaki. ISI Foundation; Italia
Fil: Beiro, Mariano Gastón. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad de San Andrés. Departamento de Ingeniería;
Fil: Aragón, Pablo. Universitat Pompeu Fabra; España
description Over the last few years, verifying the credibility of information sources has become a fundamental need to combat disinformation.Here, we present a language-agnostic model designed to assess the reliability of web domains as sources in references across multiple language editions of Wikipedia. Utilizing editing activity data, the model evaluates domain reliability within different articles of varying controversiality such as Climate Change, COVID-19, History, Media, and Biology topics.Crafting features that express domain usage across articles, the model effectively predicts domain reliability, achieving an F1 Macro score of approximately 0.80 for English and other high-resource languages. For mid-resource languages, we achieve 0.65 while the performance of low-resource languages varies. In all cases, the time the domain remains present in the articles (which we dub as permanence) is one of the most predictive features.We highlight the challenge of maintaining consistent model performance across languages of varying resource levels and demonstrate that adapting models from higher-resource languages can improve performance.
publishDate 2025
dc.date.none.fl_str_mv 2025-11
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/276962
D'Ignazi, Jacopo; Kaltenbrunner, Andreas; Mejova, Yelena; Tizzani, Michele; Kalimeri, Kyriaki; et al.; Language-Agnostic Modeling of Source Reliability on Wikipedia; Association for Computing Machinery; Acm Transactions On The Web; 2025; 11-2025; 1-30
1559-1131
CONICET Digital
CONICET
url http://hdl.handle.net/11336/276962
identifier_str_mv D'Ignazi, Jacopo; Kaltenbrunner, Andreas; Mejova, Yelena; Tizzani, Michele; Kalimeri, Kyriaki; et al.; Language-Agnostic Modeling of Source Reliability on Wikipedia; Association for Computing Machinery; Acm Transactions On The Web; 2025; 11-2025; 1-30
1559-1131
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://dl.acm.org/doi/10.1145/3777444
info:eu-repo/semantics/altIdentifier/doi/10.1145/3777444
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Association for Computing Machinery
publisher.none.fl_str_mv Association for Computing Machinery
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1852335128976556032
score 12.952241