Language-Agnostic Modeling of Source Reliability on Wikipedia
- Autores
- D'Ignazi, Jacopo; Kaltenbrunner, Andreas; Mejova, Yelena; Tizzani, Michele; Kalimeri, Kyriaki; Beiro, Mariano Gastón; Aragón, Pablo
- Año de publicación
- 2025
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Over the last few years, verifying the credibility of information sources has become a fundamental need to combat disinformation.Here, we present a language-agnostic model designed to assess the reliability of web domains as sources in references across multiple language editions of Wikipedia. Utilizing editing activity data, the model evaluates domain reliability within different articles of varying controversiality such as Climate Change, COVID-19, History, Media, and Biology topics.Crafting features that express domain usage across articles, the model effectively predicts domain reliability, achieving an F1 Macro score of approximately 0.80 for English and other high-resource languages. For mid-resource languages, we achieve 0.65 while the performance of low-resource languages varies. In all cases, the time the domain remains present in the articles (which we dub as permanence) is one of the most predictive features.We highlight the challenge of maintaining consistent model performance across languages of varying resource levels and demonstrate that adapting models from higher-resource languages can improve performance.
Fil: D'Ignazi, Jacopo. Universitat Pompeu Fabra; España. ISI Foundation; Italia
Fil: Kaltenbrunner, Andreas. Universitat Oberta de Catalunya; España. ISI Foundation; Italia
Fil: Mejova, Yelena. ISI Foundation; Italia
Fil: Tizzani, Michele. ISI Foundation; Italia
Fil: Kalimeri, Kyriaki. ISI Foundation; Italia
Fil: Beiro, Mariano Gastón. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad de San Andrés. Departamento de Ingeniería;
Fil: Aragón, Pablo. Universitat Pompeu Fabra; España - Materia
-
Information Systems
Wikipedia
Machine Learning
Disinformation - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by/2.5/ar/
- Repositorio
.jpg)
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/276962
Ver los metadatos del registro completo
| id |
CONICETDig_174e836f6ed4d687b8def52ddf75c1ab |
|---|---|
| oai_identifier_str |
oai:ri.conicet.gov.ar:11336/276962 |
| network_acronym_str |
CONICETDig |
| repository_id_str |
3498 |
| network_name_str |
CONICET Digital (CONICET) |
| spelling |
Language-Agnostic Modeling of Source Reliability on WikipediaD'Ignazi, JacopoKaltenbrunner, AndreasMejova, YelenaTizzani, MicheleKalimeri, KyriakiBeiro, Mariano GastónAragón, PabloInformation SystemsWikipediaMachine LearningDisinformationhttps://purl.org/becyt/ford/2.2https://purl.org/becyt/ford/2Over the last few years, verifying the credibility of information sources has become a fundamental need to combat disinformation.Here, we present a language-agnostic model designed to assess the reliability of web domains as sources in references across multiple language editions of Wikipedia. Utilizing editing activity data, the model evaluates domain reliability within different articles of varying controversiality such as Climate Change, COVID-19, History, Media, and Biology topics.Crafting features that express domain usage across articles, the model effectively predicts domain reliability, achieving an F1 Macro score of approximately 0.80 for English and other high-resource languages. For mid-resource languages, we achieve 0.65 while the performance of low-resource languages varies. In all cases, the time the domain remains present in the articles (which we dub as permanence) is one of the most predictive features.We highlight the challenge of maintaining consistent model performance across languages of varying resource levels and demonstrate that adapting models from higher-resource languages can improve performance.Fil: D'Ignazi, Jacopo. Universitat Pompeu Fabra; España. ISI Foundation; ItaliaFil: Kaltenbrunner, Andreas. Universitat Oberta de Catalunya; España. ISI Foundation; ItaliaFil: Mejova, Yelena. ISI Foundation; ItaliaFil: Tizzani, Michele. ISI Foundation; ItaliaFil: Kalimeri, Kyriaki. ISI Foundation; ItaliaFil: Beiro, Mariano Gastón. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad de San Andrés. Departamento de Ingeniería;Fil: Aragón, Pablo. Universitat Pompeu Fabra; EspañaAssociation for Computing Machinery2025-11info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/276962D'Ignazi, Jacopo; Kaltenbrunner, Andreas; Mejova, Yelena; Tizzani, Michele; Kalimeri, Kyriaki; et al.; Language-Agnostic Modeling of Source Reliability on Wikipedia; Association for Computing Machinery; Acm Transactions On The Web; 2025; 11-2025; 1-301559-1131CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://dl.acm.org/doi/10.1145/3777444info:eu-repo/semantics/altIdentifier/doi/10.1145/3777444info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-12-23T13:25:59Zoai:ri.conicet.gov.ar:11336/276962instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-12-23 13:26:00.008CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
| dc.title.none.fl_str_mv |
Language-Agnostic Modeling of Source Reliability on Wikipedia |
| title |
Language-Agnostic Modeling of Source Reliability on Wikipedia |
| spellingShingle |
Language-Agnostic Modeling of Source Reliability on Wikipedia D'Ignazi, Jacopo Information Systems Wikipedia Machine Learning Disinformation |
| title_short |
Language-Agnostic Modeling of Source Reliability on Wikipedia |
| title_full |
Language-Agnostic Modeling of Source Reliability on Wikipedia |
| title_fullStr |
Language-Agnostic Modeling of Source Reliability on Wikipedia |
| title_full_unstemmed |
Language-Agnostic Modeling of Source Reliability on Wikipedia |
| title_sort |
Language-Agnostic Modeling of Source Reliability on Wikipedia |
| dc.creator.none.fl_str_mv |
D'Ignazi, Jacopo Kaltenbrunner, Andreas Mejova, Yelena Tizzani, Michele Kalimeri, Kyriaki Beiro, Mariano Gastón Aragón, Pablo |
| author |
D'Ignazi, Jacopo |
| author_facet |
D'Ignazi, Jacopo Kaltenbrunner, Andreas Mejova, Yelena Tizzani, Michele Kalimeri, Kyriaki Beiro, Mariano Gastón Aragón, Pablo |
| author_role |
author |
| author2 |
Kaltenbrunner, Andreas Mejova, Yelena Tizzani, Michele Kalimeri, Kyriaki Beiro, Mariano Gastón Aragón, Pablo |
| author2_role |
author author author author author author |
| dc.subject.none.fl_str_mv |
Information Systems Wikipedia Machine Learning Disinformation |
| topic |
Information Systems Wikipedia Machine Learning Disinformation |
| purl_subject.fl_str_mv |
https://purl.org/becyt/ford/2.2 https://purl.org/becyt/ford/2 |
| dc.description.none.fl_txt_mv |
Over the last few years, verifying the credibility of information sources has become a fundamental need to combat disinformation.Here, we present a language-agnostic model designed to assess the reliability of web domains as sources in references across multiple language editions of Wikipedia. Utilizing editing activity data, the model evaluates domain reliability within different articles of varying controversiality such as Climate Change, COVID-19, History, Media, and Biology topics.Crafting features that express domain usage across articles, the model effectively predicts domain reliability, achieving an F1 Macro score of approximately 0.80 for English and other high-resource languages. For mid-resource languages, we achieve 0.65 while the performance of low-resource languages varies. In all cases, the time the domain remains present in the articles (which we dub as permanence) is one of the most predictive features.We highlight the challenge of maintaining consistent model performance across languages of varying resource levels and demonstrate that adapting models from higher-resource languages can improve performance. Fil: D'Ignazi, Jacopo. Universitat Pompeu Fabra; España. ISI Foundation; Italia Fil: Kaltenbrunner, Andreas. Universitat Oberta de Catalunya; España. ISI Foundation; Italia Fil: Mejova, Yelena. ISI Foundation; Italia Fil: Tizzani, Michele. ISI Foundation; Italia Fil: Kalimeri, Kyriaki. ISI Foundation; Italia Fil: Beiro, Mariano Gastón. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad de San Andrés. Departamento de Ingeniería; Fil: Aragón, Pablo. Universitat Pompeu Fabra; España |
| description |
Over the last few years, verifying the credibility of information sources has become a fundamental need to combat disinformation.Here, we present a language-agnostic model designed to assess the reliability of web domains as sources in references across multiple language editions of Wikipedia. Utilizing editing activity data, the model evaluates domain reliability within different articles of varying controversiality such as Climate Change, COVID-19, History, Media, and Biology topics.Crafting features that express domain usage across articles, the model effectively predicts domain reliability, achieving an F1 Macro score of approximately 0.80 for English and other high-resource languages. For mid-resource languages, we achieve 0.65 while the performance of low-resource languages varies. In all cases, the time the domain remains present in the articles (which we dub as permanence) is one of the most predictive features.We highlight the challenge of maintaining consistent model performance across languages of varying resource levels and demonstrate that adapting models from higher-resource languages can improve performance. |
| publishDate |
2025 |
| dc.date.none.fl_str_mv |
2025-11 |
| dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
| format |
article |
| status_str |
publishedVersion |
| dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/276962 D'Ignazi, Jacopo; Kaltenbrunner, Andreas; Mejova, Yelena; Tizzani, Michele; Kalimeri, Kyriaki; et al.; Language-Agnostic Modeling of Source Reliability on Wikipedia; Association for Computing Machinery; Acm Transactions On The Web; 2025; 11-2025; 1-30 1559-1131 CONICET Digital CONICET |
| url |
http://hdl.handle.net/11336/276962 |
| identifier_str_mv |
D'Ignazi, Jacopo; Kaltenbrunner, Andreas; Mejova, Yelena; Tizzani, Michele; Kalimeri, Kyriaki; et al.; Language-Agnostic Modeling of Source Reliability on Wikipedia; Association for Computing Machinery; Acm Transactions On The Web; 2025; 11-2025; 1-30 1559-1131 CONICET Digital CONICET |
| dc.language.none.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://dl.acm.org/doi/10.1145/3777444 info:eu-repo/semantics/altIdentifier/doi/10.1145/3777444 |
| dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by/2.5/ar/ |
| eu_rights_str_mv |
openAccess |
| rights_invalid_str_mv |
https://creativecommons.org/licenses/by/2.5/ar/ |
| dc.format.none.fl_str_mv |
application/pdf application/pdf |
| dc.publisher.none.fl_str_mv |
Association for Computing Machinery |
| publisher.none.fl_str_mv |
Association for Computing Machinery |
| dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
| reponame_str |
CONICET Digital (CONICET) |
| collection |
CONICET Digital (CONICET) |
| instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
| repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
| repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
| _version_ |
1852335128976556032 |
| score |
12.952241 |