RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations

Autores
Pérez Bianchi, Paula; Anselmo, Sol; Vásquez Currié, Malena; Medel, Jimena; Uelf, Estefanía; Dos Santos, Alicia; Buosi, Noemí; Vargas, Rosana; Reves Szemere, Juliana; Volcovinsky, Bruno; Massaroli, Hugo; Andrade, Manuel; Monastra, Alejandro Gabriel; Iarussi, Emmanuel; Siless, Viviana; Bruno, Luciana
Año de publicación
2025
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
The Pap smear remains the primary screening test for cervical cancer in many low-resource regions, yet publicly available image datasets largely feature liquid-based preparations. We introduce RIVA, a high-resolution collection of 959 conventional-smear images (1024 × 1024 px) scanned at 40x magnification, sourced from 115 patients. To ensure label quality, each image was annotated by up to four independent medical professionals, with 42% of the images reviewed by all four, resulting in 26,158 annotations based on the Bethesda classification. Annotations provide coordinates of nuclei and classification labels by up to four annotators. The dataset includes 15,949 unique cells across five (pre) cancerous types (SCC, HSIL, ASCH, LSIL, ASCUS) and three non-lesion categories (NILM, ENDO, INFL). These four-expert annotations not only give RIVA a consensus-driven ground truth for robust AI training but also enable inter-annotator consistency analysis-agreement rates reach 94% for lesion vs. nonlesion and 74% across the full eight-category Bethesda scheme.
Fil: Pérez Bianchi, Paula. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Fil: Anselmo, Sol. Instituto Tecnológico de Buenos Aires; Argentina
Fil: Vásquez Currié, Malena. Instituto Tecnológico de Buenos Aires; Argentina
Fil: Medel, Jimena. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;
Fil: Uelf, Estefanía. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;
Fil: Dos Santos, Alicia. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;
Fil: Buosi, Noemí. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;
Fil: Vargas, Rosana. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;
Fil: Reves Szemere, Juliana. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de San Martín. Escuela de Ciencia y Tecnología; Argentina
Fil: Volcovinsky, Bruno. Instituto Tecnológico de Buenos Aires; Argentina
Fil: Massaroli, Hugo. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Fil: Andrade, Manuel. Universidad Torcuato Di Tella; Argentina
Fil: Monastra, Alejandro Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de General Sarmiento. Instituto de Ciencias; Argentina
Fil: Iarussi, Emmanuel. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Torcuato Di Tella; Argentina
Fil: Siless, Viviana. Universidad Torcuato Di Tella; Argentina
Fil: Bruno, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Torcuato Di Tella; Argentina
Materia
Pap Smear Cytology Dataset
Bethesda classification
Digital Pathology
Multiple Annotations Dataset
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/280531

id CONICETDig_4fd3fd9a9205ab9979ec4b56ea8492ce
oai_identifier_str oai:ri.conicet.gov.ar:11336/280531
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent AnnotationsPérez Bianchi, PaulaAnselmo, SolVásquez Currié, MalenaMedel, JimenaUelf, EstefaníaDos Santos, AliciaBuosi, NoemíVargas, RosanaReves Szemere, JulianaVolcovinsky, BrunoMassaroli, HugoAndrade, ManuelMonastra, Alejandro GabrielIarussi, EmmanuelSiless, VivianaBruno, LucianaPap Smear Cytology DatasetBethesda classificationDigital PathologyMultiple Annotations Datasethttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1The Pap smear remains the primary screening test for cervical cancer in many low-resource regions, yet publicly available image datasets largely feature liquid-based preparations. We introduce RIVA, a high-resolution collection of 959 conventional-smear images (1024 × 1024 px) scanned at 40x magnification, sourced from 115 patients. To ensure label quality, each image was annotated by up to four independent medical professionals, with 42% of the images reviewed by all four, resulting in 26,158 annotations based on the Bethesda classification. Annotations provide coordinates of nuclei and classification labels by up to four annotators. The dataset includes 15,949 unique cells across five (pre) cancerous types (SCC, HSIL, ASCH, LSIL, ASCUS) and three non-lesion categories (NILM, ENDO, INFL). These four-expert annotations not only give RIVA a consensus-driven ground truth for robust AI training but also enable inter-annotator consistency analysis-agreement rates reach 94% for lesion vs. nonlesion and 74% across the full eight-category Bethesda scheme.Fil: Pérez Bianchi, Paula. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Anselmo, Sol. Instituto Tecnológico de Buenos Aires; ArgentinaFil: Vásquez Currié, Malena. Instituto Tecnológico de Buenos Aires; ArgentinaFil: Medel, Jimena. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;Fil: Uelf, Estefanía. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;Fil: Dos Santos, Alicia. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;Fil: Buosi, Noemí. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;Fil: Vargas, Rosana. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;Fil: Reves Szemere, Juliana. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de San Martín. Escuela de Ciencia y Tecnología; ArgentinaFil: Volcovinsky, Bruno. Instituto Tecnológico de Buenos Aires; ArgentinaFil: Massaroli, Hugo. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Andrade, Manuel. Universidad Torcuato Di Tella; ArgentinaFil: Monastra, Alejandro Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de General Sarmiento. Instituto de Ciencias; ArgentinaFil: Iarussi, Emmanuel. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Torcuato Di Tella; ArgentinaFil: Siless, Viviana. Universidad Torcuato Di Tella; ArgentinaFil: Bruno, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Torcuato Di Tella; ArgentinaSpringer2025-12info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/280531Pérez Bianchi, Paula; Anselmo, Sol; Vásquez Currié, Malena; Medel, Jimena; Uelf, Estefanía; et al.; RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations; Springer; Scientific Data; 12; 1; 12-2025; 1-72052-4463CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.nature.com/articles/s41597-025-06280-2info:eu-repo/semantics/altIdentifier/doi/10.1038/s41597-025-06280-2info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-nd/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2026-02-26T10:07:48Zoai:ri.conicet.gov.ar:11336/280531instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982026-02-26 10:07:48.824CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations
title RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations
spellingShingle RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations
Pérez Bianchi, Paula
Pap Smear Cytology Dataset
Bethesda classification
Digital Pathology
Multiple Annotations Dataset
title_short RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations
title_full RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations
title_fullStr RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations
title_full_unstemmed RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations
title_sort RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations
dc.creator.none.fl_str_mv Pérez Bianchi, Paula
Anselmo, Sol
Vásquez Currié, Malena
Medel, Jimena
Uelf, Estefanía
Dos Santos, Alicia
Buosi, Noemí
Vargas, Rosana
Reves Szemere, Juliana
Volcovinsky, Bruno
Massaroli, Hugo
Andrade, Manuel
Monastra, Alejandro Gabriel
Iarussi, Emmanuel
Siless, Viviana
Bruno, Luciana
author Pérez Bianchi, Paula
author_facet Pérez Bianchi, Paula
Anselmo, Sol
Vásquez Currié, Malena
Medel, Jimena
Uelf, Estefanía
Dos Santos, Alicia
Buosi, Noemí
Vargas, Rosana
Reves Szemere, Juliana
Volcovinsky, Bruno
Massaroli, Hugo
Andrade, Manuel
Monastra, Alejandro Gabriel
Iarussi, Emmanuel
Siless, Viviana
Bruno, Luciana
author_role author
author2 Anselmo, Sol
Vásquez Currié, Malena
Medel, Jimena
Uelf, Estefanía
Dos Santos, Alicia
Buosi, Noemí
Vargas, Rosana
Reves Szemere, Juliana
Volcovinsky, Bruno
Massaroli, Hugo
Andrade, Manuel
Monastra, Alejandro Gabriel
Iarussi, Emmanuel
Siless, Viviana
Bruno, Luciana
author2_role author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
dc.subject.none.fl_str_mv Pap Smear Cytology Dataset
Bethesda classification
Digital Pathology
Multiple Annotations Dataset
topic Pap Smear Cytology Dataset
Bethesda classification
Digital Pathology
Multiple Annotations Dataset
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv The Pap smear remains the primary screening test for cervical cancer in many low-resource regions, yet publicly available image datasets largely feature liquid-based preparations. We introduce RIVA, a high-resolution collection of 959 conventional-smear images (1024 × 1024 px) scanned at 40x magnification, sourced from 115 patients. To ensure label quality, each image was annotated by up to four independent medical professionals, with 42% of the images reviewed by all four, resulting in 26,158 annotations based on the Bethesda classification. Annotations provide coordinates of nuclei and classification labels by up to four annotators. The dataset includes 15,949 unique cells across five (pre) cancerous types (SCC, HSIL, ASCH, LSIL, ASCUS) and three non-lesion categories (NILM, ENDO, INFL). These four-expert annotations not only give RIVA a consensus-driven ground truth for robust AI training but also enable inter-annotator consistency analysis-agreement rates reach 94% for lesion vs. nonlesion and 74% across the full eight-category Bethesda scheme.
Fil: Pérez Bianchi, Paula. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Fil: Anselmo, Sol. Instituto Tecnológico de Buenos Aires; Argentina
Fil: Vásquez Currié, Malena. Instituto Tecnológico de Buenos Aires; Argentina
Fil: Medel, Jimena. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;
Fil: Uelf, Estefanía. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;
Fil: Dos Santos, Alicia. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;
Fil: Buosi, Noemí. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;
Fil: Vargas, Rosana. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;
Fil: Reves Szemere, Juliana. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de San Martín. Escuela de Ciencia y Tecnología; Argentina
Fil: Volcovinsky, Bruno. Instituto Tecnológico de Buenos Aires; Argentina
Fil: Massaroli, Hugo. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Fil: Andrade, Manuel. Universidad Torcuato Di Tella; Argentina
Fil: Monastra, Alejandro Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de General Sarmiento. Instituto de Ciencias; Argentina
Fil: Iarussi, Emmanuel. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Torcuato Di Tella; Argentina
Fil: Siless, Viviana. Universidad Torcuato Di Tella; Argentina
Fil: Bruno, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Torcuato Di Tella; Argentina
description The Pap smear remains the primary screening test for cervical cancer in many low-resource regions, yet publicly available image datasets largely feature liquid-based preparations. We introduce RIVA, a high-resolution collection of 959 conventional-smear images (1024 × 1024 px) scanned at 40x magnification, sourced from 115 patients. To ensure label quality, each image was annotated by up to four independent medical professionals, with 42% of the images reviewed by all four, resulting in 26,158 annotations based on the Bethesda classification. Annotations provide coordinates of nuclei and classification labels by up to four annotators. The dataset includes 15,949 unique cells across five (pre) cancerous types (SCC, HSIL, ASCH, LSIL, ASCUS) and three non-lesion categories (NILM, ENDO, INFL). These four-expert annotations not only give RIVA a consensus-driven ground truth for robust AI training but also enable inter-annotator consistency analysis-agreement rates reach 94% for lesion vs. nonlesion and 74% across the full eight-category Bethesda scheme.
publishDate 2025
dc.date.none.fl_str_mv 2025-12
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/280531
Pérez Bianchi, Paula; Anselmo, Sol; Vásquez Currié, Malena; Medel, Jimena; Uelf, Estefanía; et al.; RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations; Springer; Scientific Data; 12; 1; 12-2025; 1-7
2052-4463
CONICET Digital
CONICET
url http://hdl.handle.net/11336/280531
identifier_str_mv Pérez Bianchi, Paula; Anselmo, Sol; Vásquez Currié, Malena; Medel, Jimena; Uelf, Estefanía; et al.; RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations; Springer; Scientific Data; 12; 1; 12-2025; 1-7
2052-4463
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://www.nature.com/articles/s41597-025-06280-2
info:eu-repo/semantics/altIdentifier/doi/10.1038/s41597-025-06280-2
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Springer
publisher.none.fl_str_mv Springer
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1858305276820586496
score 13.176822