RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations
- Autores
- Pérez Bianchi, Paula; Anselmo, Sol; Vásquez Currié, Malena; Medel, Jimena; Uelf, Estefanía; Dos Santos, Alicia; Buosi, Noemí; Vargas, Rosana; Reves Szemere, Juliana; Volcovinsky, Bruno; Massaroli, Hugo; Andrade, Manuel; Monastra, Alejandro Gabriel; Iarussi, Emmanuel; Siless, Viviana; Bruno, Luciana
- Año de publicación
- 2025
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- The Pap smear remains the primary screening test for cervical cancer in many low-resource regions, yet publicly available image datasets largely feature liquid-based preparations. We introduce RIVA, a high-resolution collection of 959 conventional-smear images (1024 × 1024 px) scanned at 40x magnification, sourced from 115 patients. To ensure label quality, each image was annotated by up to four independent medical professionals, with 42% of the images reviewed by all four, resulting in 26,158 annotations based on the Bethesda classification. Annotations provide coordinates of nuclei and classification labels by up to four annotators. The dataset includes 15,949 unique cells across five (pre) cancerous types (SCC, HSIL, ASCH, LSIL, ASCUS) and three non-lesion categories (NILM, ENDO, INFL). These four-expert annotations not only give RIVA a consensus-driven ground truth for robust AI training but also enable inter-annotator consistency analysis-agreement rates reach 94% for lesion vs. nonlesion and 74% across the full eight-category Bethesda scheme.
Fil: Pérez Bianchi, Paula. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Fil: Anselmo, Sol. Instituto Tecnológico de Buenos Aires; Argentina
Fil: Vásquez Currié, Malena. Instituto Tecnológico de Buenos Aires; Argentina
Fil: Medel, Jimena. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;
Fil: Uelf, Estefanía. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;
Fil: Dos Santos, Alicia. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;
Fil: Buosi, Noemí. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;
Fil: Vargas, Rosana. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;
Fil: Reves Szemere, Juliana. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de San Martín. Escuela de Ciencia y Tecnología; Argentina
Fil: Volcovinsky, Bruno. Instituto Tecnológico de Buenos Aires; Argentina
Fil: Massaroli, Hugo. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Fil: Andrade, Manuel. Universidad Torcuato Di Tella; Argentina
Fil: Monastra, Alejandro Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de General Sarmiento. Instituto de Ciencias; Argentina
Fil: Iarussi, Emmanuel. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Torcuato Di Tella; Argentina
Fil: Siless, Viviana. Universidad Torcuato Di Tella; Argentina
Fil: Bruno, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Torcuato Di Tella; Argentina - Materia
-
Pap Smear Cytology Dataset
Bethesda classification
Digital Pathology
Multiple Annotations Dataset - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
- Repositorio
.jpg)
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/280531
Ver los metadatos del registro completo
| id |
CONICETDig_4fd3fd9a9205ab9979ec4b56ea8492ce |
|---|---|
| oai_identifier_str |
oai:ri.conicet.gov.ar:11336/280531 |
| network_acronym_str |
CONICETDig |
| repository_id_str |
3498 |
| network_name_str |
CONICET Digital (CONICET) |
| spelling |
RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent AnnotationsPérez Bianchi, PaulaAnselmo, SolVásquez Currié, MalenaMedel, JimenaUelf, EstefaníaDos Santos, AliciaBuosi, NoemíVargas, RosanaReves Szemere, JulianaVolcovinsky, BrunoMassaroli, HugoAndrade, ManuelMonastra, Alejandro GabrielIarussi, EmmanuelSiless, VivianaBruno, LucianaPap Smear Cytology DatasetBethesda classificationDigital PathologyMultiple Annotations Datasethttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1The Pap smear remains the primary screening test for cervical cancer in many low-resource regions, yet publicly available image datasets largely feature liquid-based preparations. We introduce RIVA, a high-resolution collection of 959 conventional-smear images (1024 × 1024 px) scanned at 40x magnification, sourced from 115 patients. To ensure label quality, each image was annotated by up to four independent medical professionals, with 42% of the images reviewed by all four, resulting in 26,158 annotations based on the Bethesda classification. Annotations provide coordinates of nuclei and classification labels by up to four annotators. The dataset includes 15,949 unique cells across five (pre) cancerous types (SCC, HSIL, ASCH, LSIL, ASCUS) and three non-lesion categories (NILM, ENDO, INFL). These four-expert annotations not only give RIVA a consensus-driven ground truth for robust AI training but also enable inter-annotator consistency analysis-agreement rates reach 94% for lesion vs. nonlesion and 74% across the full eight-category Bethesda scheme.Fil: Pérez Bianchi, Paula. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Anselmo, Sol. Instituto Tecnológico de Buenos Aires; ArgentinaFil: Vásquez Currié, Malena. Instituto Tecnológico de Buenos Aires; ArgentinaFil: Medel, Jimena. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;Fil: Uelf, Estefanía. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;Fil: Dos Santos, Alicia. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;Fil: Buosi, Noemí. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;Fil: Vargas, Rosana. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires;Fil: Reves Szemere, Juliana. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de San Martín. Escuela de Ciencia y Tecnología; ArgentinaFil: Volcovinsky, Bruno. Instituto Tecnológico de Buenos Aires; ArgentinaFil: Massaroli, Hugo. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Andrade, Manuel. Universidad Torcuato Di Tella; ArgentinaFil: Monastra, Alejandro Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de General Sarmiento. Instituto de Ciencias; ArgentinaFil: Iarussi, Emmanuel. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Torcuato Di Tella; ArgentinaFil: Siless, Viviana. Universidad Torcuato Di Tella; ArgentinaFil: Bruno, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Torcuato Di Tella; ArgentinaSpringer2025-12info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/280531Pérez Bianchi, Paula; Anselmo, Sol; Vásquez Currié, Malena; Medel, Jimena; Uelf, Estefanía; et al.; RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations; Springer; Scientific Data; 12; 1; 12-2025; 1-72052-4463CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.nature.com/articles/s41597-025-06280-2info:eu-repo/semantics/altIdentifier/doi/10.1038/s41597-025-06280-2info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-nd/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2026-02-26T10:07:48Zoai:ri.conicet.gov.ar:11336/280531instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982026-02-26 10:07:48.824CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
| dc.title.none.fl_str_mv |
RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations |
| title |
RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations |
| spellingShingle |
RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations Pérez Bianchi, Paula Pap Smear Cytology Dataset Bethesda classification Digital Pathology Multiple Annotations Dataset |
| title_short |
RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations |
| title_full |
RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations |
| title_fullStr |
RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations |
| title_full_unstemmed |
RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations |
| title_sort |
RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations |
| dc.creator.none.fl_str_mv |
Pérez Bianchi, Paula Anselmo, Sol Vásquez Currié, Malena Medel, Jimena Uelf, Estefanía Dos Santos, Alicia Buosi, Noemí Vargas, Rosana Reves Szemere, Juliana Volcovinsky, Bruno Massaroli, Hugo Andrade, Manuel Monastra, Alejandro Gabriel Iarussi, Emmanuel Siless, Viviana Bruno, Luciana |
| author |
Pérez Bianchi, Paula |
| author_facet |
Pérez Bianchi, Paula Anselmo, Sol Vásquez Currié, Malena Medel, Jimena Uelf, Estefanía Dos Santos, Alicia Buosi, Noemí Vargas, Rosana Reves Szemere, Juliana Volcovinsky, Bruno Massaroli, Hugo Andrade, Manuel Monastra, Alejandro Gabriel Iarussi, Emmanuel Siless, Viviana Bruno, Luciana |
| author_role |
author |
| author2 |
Anselmo, Sol Vásquez Currié, Malena Medel, Jimena Uelf, Estefanía Dos Santos, Alicia Buosi, Noemí Vargas, Rosana Reves Szemere, Juliana Volcovinsky, Bruno Massaroli, Hugo Andrade, Manuel Monastra, Alejandro Gabriel Iarussi, Emmanuel Siless, Viviana Bruno, Luciana |
| author2_role |
author author author author author author author author author author author author author author author |
| dc.subject.none.fl_str_mv |
Pap Smear Cytology Dataset Bethesda classification Digital Pathology Multiple Annotations Dataset |
| topic |
Pap Smear Cytology Dataset Bethesda classification Digital Pathology Multiple Annotations Dataset |
| purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
| dc.description.none.fl_txt_mv |
The Pap smear remains the primary screening test for cervical cancer in many low-resource regions, yet publicly available image datasets largely feature liquid-based preparations. We introduce RIVA, a high-resolution collection of 959 conventional-smear images (1024 × 1024 px) scanned at 40x magnification, sourced from 115 patients. To ensure label quality, each image was annotated by up to four independent medical professionals, with 42% of the images reviewed by all four, resulting in 26,158 annotations based on the Bethesda classification. Annotations provide coordinates of nuclei and classification labels by up to four annotators. The dataset includes 15,949 unique cells across five (pre) cancerous types (SCC, HSIL, ASCH, LSIL, ASCUS) and three non-lesion categories (NILM, ENDO, INFL). These four-expert annotations not only give RIVA a consensus-driven ground truth for robust AI training but also enable inter-annotator consistency analysis-agreement rates reach 94% for lesion vs. nonlesion and 74% across the full eight-category Bethesda scheme. Fil: Pérez Bianchi, Paula. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina Fil: Anselmo, Sol. Instituto Tecnológico de Buenos Aires; Argentina Fil: Vásquez Currié, Malena. Instituto Tecnológico de Buenos Aires; Argentina Fil: Medel, Jimena. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires; Fil: Uelf, Estefanía. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires; Fil: Dos Santos, Alicia. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires; Fil: Buosi, Noemí. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires; Fil: Vargas, Rosana. Hospital General de Agudos Bernardino Rivadavia ; Gobierno de la Ciudad Autonoma de Buenos Aires; Fil: Reves Szemere, Juliana. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de San Martín. Escuela de Ciencia y Tecnología; Argentina Fil: Volcovinsky, Bruno. Instituto Tecnológico de Buenos Aires; Argentina Fil: Massaroli, Hugo. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina Fil: Andrade, Manuel. Universidad Torcuato Di Tella; Argentina Fil: Monastra, Alejandro Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de General Sarmiento. Instituto de Ciencias; Argentina Fil: Iarussi, Emmanuel. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Torcuato Di Tella; Argentina Fil: Siless, Viviana. Universidad Torcuato Di Tella; Argentina Fil: Bruno, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Torcuato Di Tella; Argentina |
| description |
The Pap smear remains the primary screening test for cervical cancer in many low-resource regions, yet publicly available image datasets largely feature liquid-based preparations. We introduce RIVA, a high-resolution collection of 959 conventional-smear images (1024 × 1024 px) scanned at 40x magnification, sourced from 115 patients. To ensure label quality, each image was annotated by up to four independent medical professionals, with 42% of the images reviewed by all four, resulting in 26,158 annotations based on the Bethesda classification. Annotations provide coordinates of nuclei and classification labels by up to four annotators. The dataset includes 15,949 unique cells across five (pre) cancerous types (SCC, HSIL, ASCH, LSIL, ASCUS) and three non-lesion categories (NILM, ENDO, INFL). These four-expert annotations not only give RIVA a consensus-driven ground truth for robust AI training but also enable inter-annotator consistency analysis-agreement rates reach 94% for lesion vs. nonlesion and 74% across the full eight-category Bethesda scheme. |
| publishDate |
2025 |
| dc.date.none.fl_str_mv |
2025-12 |
| dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
| format |
article |
| status_str |
publishedVersion |
| dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/280531 Pérez Bianchi, Paula; Anselmo, Sol; Vásquez Currié, Malena; Medel, Jimena; Uelf, Estefanía; et al.; RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations; Springer; Scientific Data; 12; 1; 12-2025; 1-7 2052-4463 CONICET Digital CONICET |
| url |
http://hdl.handle.net/11336/280531 |
| identifier_str_mv |
Pérez Bianchi, Paula; Anselmo, Sol; Vásquez Currié, Malena; Medel, Jimena; Uelf, Estefanía; et al.; RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations; Springer; Scientific Data; 12; 1; 12-2025; 1-7 2052-4463 CONICET Digital CONICET |
| dc.language.none.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://www.nature.com/articles/s41597-025-06280-2 info:eu-repo/semantics/altIdentifier/doi/10.1038/s41597-025-06280-2 |
| dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-nd/2.5/ar/ |
| eu_rights_str_mv |
openAccess |
| rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/ |
| dc.format.none.fl_str_mv |
application/pdf application/pdf |
| dc.publisher.none.fl_str_mv |
Springer |
| publisher.none.fl_str_mv |
Springer |
| dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
| reponame_str |
CONICET Digital (CONICET) |
| collection |
CONICET Digital (CONICET) |
| instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
| repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
| repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
| _version_ |
1858305276820586496 |
| score |
13.176822 |