Patient-centric synthetic data generation: a new methodology for Chronic Kidney Disease
- Autores
- Álvarez, Candelaria; Ibeas, José; Balladini, Javier; Suppi, Remo
- Año de publicación
- 2024
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- Access to medical data is often restricted due to privacy and security policies. Synthetic data generation from real data is a widely adopted technique to address these limitations. This research presents a patient-centric methodology for generating synthetic data, specifically designed for patients diagnosed with Chronic Kidney Disease (CKD). The key advantage of this proposal is its explainability and the traceability of the results, as it relies on statistics and data analysis rather than AI algorithms. The MIMIC-III clinical dataset serves as the foundation for generating synthetic patients in this study. This article details the data preprocessing and filtering applied to this dataset. Subsequently, synthetic data for CKD patients is generated using the proposed methodology. A comparison is then conducted between the synthetic data and the real data. Additionally, the synthetic data is compared with results obtained using the AI algorithm known as SMOTE. Generally, the metrics for the synthetic data generated by SMOTE are slightly superior. However, the results obtained with the proposed methodology exhibit minimal deviations from the MIMIC data across most CKD stages.
Red de Universidades con Carreras en Informática - Materia
-
Ciencias Informáticas
patient-centric methodology
synthetic data generation
chronic kidney disease - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-nc-sa/4.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/176195
Ver los metadatos del registro completo
id |
SEDICI_8030f64f89131accbf6dd25390506baf |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/176195 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
Patient-centric synthetic data generation: a new methodology for Chronic Kidney DiseaseÁlvarez, CandelariaIbeas, JoséBalladini, JavierSuppi, RemoCiencias Informáticaspatient-centric methodologysynthetic data generationchronic kidney diseaseAccess to medical data is often restricted due to privacy and security policies. Synthetic data generation from real data is a widely adopted technique to address these limitations. This research presents a patient-centric methodology for generating synthetic data, specifically designed for patients diagnosed with Chronic Kidney Disease (CKD). The key advantage of this proposal is its explainability and the traceability of the results, as it relies on statistics and data analysis rather than AI algorithms. The MIMIC-III clinical dataset serves as the foundation for generating synthetic patients in this study. This article details the data preprocessing and filtering applied to this dataset. Subsequently, synthetic data for CKD patients is generated using the proposed methodology. A comparison is then conducted between the synthetic data and the real data. Additionally, the synthetic data is compared with results obtained using the AI algorithm known as SMOTE. Generally, the metrics for the synthetic data generated by SMOTE are slightly superior. However, the results obtained with the proposed methodology exhibit minimal deviations from the MIMIC data across most CKD stages.Red de Universidades con Carreras en Informática2024-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf280-289http://sedici.unlp.edu.ar/handle/10915/176195enginfo:eu-repo/semantics/altIdentifier/isbn/978-950-34-2428-5info:eu-repo/semantics/reference/hdl/10915/172755info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-17T10:30:07Zoai:sedici.unlp.edu.ar:10915/176195Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-17 10:30:08.194SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
Patient-centric synthetic data generation: a new methodology for Chronic Kidney Disease |
title |
Patient-centric synthetic data generation: a new methodology for Chronic Kidney Disease |
spellingShingle |
Patient-centric synthetic data generation: a new methodology for Chronic Kidney Disease Álvarez, Candelaria Ciencias Informáticas patient-centric methodology synthetic data generation chronic kidney disease |
title_short |
Patient-centric synthetic data generation: a new methodology for Chronic Kidney Disease |
title_full |
Patient-centric synthetic data generation: a new methodology for Chronic Kidney Disease |
title_fullStr |
Patient-centric synthetic data generation: a new methodology for Chronic Kidney Disease |
title_full_unstemmed |
Patient-centric synthetic data generation: a new methodology for Chronic Kidney Disease |
title_sort |
Patient-centric synthetic data generation: a new methodology for Chronic Kidney Disease |
dc.creator.none.fl_str_mv |
Álvarez, Candelaria Ibeas, José Balladini, Javier Suppi, Remo |
author |
Álvarez, Candelaria |
author_facet |
Álvarez, Candelaria Ibeas, José Balladini, Javier Suppi, Remo |
author_role |
author |
author2 |
Ibeas, José Balladini, Javier Suppi, Remo |
author2_role |
author author author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas patient-centric methodology synthetic data generation chronic kidney disease |
topic |
Ciencias Informáticas patient-centric methodology synthetic data generation chronic kidney disease |
dc.description.none.fl_txt_mv |
Access to medical data is often restricted due to privacy and security policies. Synthetic data generation from real data is a widely adopted technique to address these limitations. This research presents a patient-centric methodology for generating synthetic data, specifically designed for patients diagnosed with Chronic Kidney Disease (CKD). The key advantage of this proposal is its explainability and the traceability of the results, as it relies on statistics and data analysis rather than AI algorithms. The MIMIC-III clinical dataset serves as the foundation for generating synthetic patients in this study. This article details the data preprocessing and filtering applied to this dataset. Subsequently, synthetic data for CKD patients is generated using the proposed methodology. A comparison is then conducted between the synthetic data and the real data. Additionally, the synthetic data is compared with results obtained using the AI algorithm known as SMOTE. Generally, the metrics for the synthetic data generated by SMOTE are slightly superior. However, the results obtained with the proposed methodology exhibit minimal deviations from the MIMIC data across most CKD stages. Red de Universidades con Carreras en Informática |
description |
Access to medical data is often restricted due to privacy and security policies. Synthetic data generation from real data is a widely adopted technique to address these limitations. This research presents a patient-centric methodology for generating synthetic data, specifically designed for patients diagnosed with Chronic Kidney Disease (CKD). The key advantage of this proposal is its explainability and the traceability of the results, as it relies on statistics and data analysis rather than AI algorithms. The MIMIC-III clinical dataset serves as the foundation for generating synthetic patients in this study. This article details the data preprocessing and filtering applied to this dataset. Subsequently, synthetic data for CKD patients is generated using the proposed methodology. A comparison is then conducted between the synthetic data and the real data. Additionally, the synthetic data is compared with results obtained using the AI algorithm known as SMOTE. Generally, the metrics for the synthetic data generated by SMOTE are slightly superior. However, the results obtained with the proposed methodology exhibit minimal deviations from the MIMIC data across most CKD stages. |
publishDate |
2024 |
dc.date.none.fl_str_mv |
2024-10 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/176195 |
url |
http://sedici.unlp.edu.ar/handle/10915/176195 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/isbn/978-950-34-2428-5 info:eu-repo/semantics/reference/hdl/10915/172755 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.format.none.fl_str_mv |
application/pdf 280-289 |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1843533091364143104 |
score |
13.001348 |