Patient-centric synthetic data generation: a new methodology for Chronic Kidney Disease

Autores
Álvarez, Candelaria; Ibeas, José; Balladini, Javier; Suppi, Remo
Año de publicación
2024
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Access to medical data is often restricted due to privacy and security policies. Synthetic data generation from real data is a widely adopted technique to address these limitations. This research presents a patient-centric methodology for generating synthetic data, specifically designed for patients diagnosed with Chronic Kidney Disease (CKD). The key advantage of this proposal is its explainability and the traceability of the results, as it relies on statistics and data analysis rather than AI algorithms. The MIMIC-III clinical dataset serves as the foundation for generating synthetic patients in this study. This article details the data preprocessing and filtering applied to this dataset. Subsequently, synthetic data for CKD patients is generated using the proposed methodology. A comparison is then conducted between the synthetic data and the real data. Additionally, the synthetic data is compared with results obtained using the AI algorithm known as SMOTE. Generally, the metrics for the synthetic data generated by SMOTE are slightly superior. However, the results obtained with the proposed methodology exhibit minimal deviations from the MIMIC data across most CKD stages.
Red de Universidades con Carreras en Informática
Materia
Ciencias Informáticas
patient-centric methodology
synthetic data generation
chronic kidney disease
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/176195

id SEDICI_8030f64f89131accbf6dd25390506baf
oai_identifier_str oai:sedici.unlp.edu.ar:10915/176195
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Patient-centric synthetic data generation: a new methodology for Chronic Kidney DiseaseÁlvarez, CandelariaIbeas, JoséBalladini, JavierSuppi, RemoCiencias Informáticaspatient-centric methodologysynthetic data generationchronic kidney diseaseAccess to medical data is often restricted due to privacy and security policies. Synthetic data generation from real data is a widely adopted technique to address these limitations. This research presents a patient-centric methodology for generating synthetic data, specifically designed for patients diagnosed with Chronic Kidney Disease (CKD). The key advantage of this proposal is its explainability and the traceability of the results, as it relies on statistics and data analysis rather than AI algorithms. The MIMIC-III clinical dataset serves as the foundation for generating synthetic patients in this study. This article details the data preprocessing and filtering applied to this dataset. Subsequently, synthetic data for CKD patients is generated using the proposed methodology. A comparison is then conducted between the synthetic data and the real data. Additionally, the synthetic data is compared with results obtained using the AI algorithm known as SMOTE. Generally, the metrics for the synthetic data generated by SMOTE are slightly superior. However, the results obtained with the proposed methodology exhibit minimal deviations from the MIMIC data across most CKD stages.Red de Universidades con Carreras en Informática2024-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf280-289http://sedici.unlp.edu.ar/handle/10915/176195enginfo:eu-repo/semantics/altIdentifier/isbn/978-950-34-2428-5info:eu-repo/semantics/reference/hdl/10915/172755info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-17T10:30:07Zoai:sedici.unlp.edu.ar:10915/176195Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-17 10:30:08.194SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Patient-centric synthetic data generation: a new methodology for Chronic Kidney Disease
title Patient-centric synthetic data generation: a new methodology for Chronic Kidney Disease
spellingShingle Patient-centric synthetic data generation: a new methodology for Chronic Kidney Disease
Álvarez, Candelaria
Ciencias Informáticas
patient-centric methodology
synthetic data generation
chronic kidney disease
title_short Patient-centric synthetic data generation: a new methodology for Chronic Kidney Disease
title_full Patient-centric synthetic data generation: a new methodology for Chronic Kidney Disease
title_fullStr Patient-centric synthetic data generation: a new methodology for Chronic Kidney Disease
title_full_unstemmed Patient-centric synthetic data generation: a new methodology for Chronic Kidney Disease
title_sort Patient-centric synthetic data generation: a new methodology for Chronic Kidney Disease
dc.creator.none.fl_str_mv Álvarez, Candelaria
Ibeas, José
Balladini, Javier
Suppi, Remo
author Álvarez, Candelaria
author_facet Álvarez, Candelaria
Ibeas, José
Balladini, Javier
Suppi, Remo
author_role author
author2 Ibeas, José
Balladini, Javier
Suppi, Remo
author2_role author
author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
patient-centric methodology
synthetic data generation
chronic kidney disease
topic Ciencias Informáticas
patient-centric methodology
synthetic data generation
chronic kidney disease
dc.description.none.fl_txt_mv Access to medical data is often restricted due to privacy and security policies. Synthetic data generation from real data is a widely adopted technique to address these limitations. This research presents a patient-centric methodology for generating synthetic data, specifically designed for patients diagnosed with Chronic Kidney Disease (CKD). The key advantage of this proposal is its explainability and the traceability of the results, as it relies on statistics and data analysis rather than AI algorithms. The MIMIC-III clinical dataset serves as the foundation for generating synthetic patients in this study. This article details the data preprocessing and filtering applied to this dataset. Subsequently, synthetic data for CKD patients is generated using the proposed methodology. A comparison is then conducted between the synthetic data and the real data. Additionally, the synthetic data is compared with results obtained using the AI algorithm known as SMOTE. Generally, the metrics for the synthetic data generated by SMOTE are slightly superior. However, the results obtained with the proposed methodology exhibit minimal deviations from the MIMIC data across most CKD stages.
Red de Universidades con Carreras en Informática
description Access to medical data is often restricted due to privacy and security policies. Synthetic data generation from real data is a widely adopted technique to address these limitations. This research presents a patient-centric methodology for generating synthetic data, specifically designed for patients diagnosed with Chronic Kidney Disease (CKD). The key advantage of this proposal is its explainability and the traceability of the results, as it relies on statistics and data analysis rather than AI algorithms. The MIMIC-III clinical dataset serves as the foundation for generating synthetic patients in this study. This article details the data preprocessing and filtering applied to this dataset. Subsequently, synthetic data for CKD patients is generated using the proposed methodology. A comparison is then conducted between the synthetic data and the real data. Additionally, the synthetic data is compared with results obtained using the AI algorithm known as SMOTE. Generally, the metrics for the synthetic data generated by SMOTE are slightly superior. However, the results obtained with the proposed methodology exhibit minimal deviations from the MIMIC data across most CKD stages.
publishDate 2024
dc.date.none.fl_str_mv 2024-10
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/176195
url http://sedici.unlp.edu.ar/handle/10915/176195
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/isbn/978-950-34-2428-5
info:eu-repo/semantics/reference/hdl/10915/172755
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv application/pdf
280-289
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1843533091364143104
score 13.001348