TreeSpark: A Distributed Tool for Progeny Analysis based on Spark

Autores
López, Paula; Hasperué, Waldo; Quiroga, Facundo Manuel; Ronchetti, Franco
Año de publicación
2021
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Progeny analyses are useful in biological sciences for various purposes, such as improving individuals in new generations or carrying out molecular analysis of the transmission of genetic characteristics. Analyzing these data by making comparisons between individuals of a generation with their offspring is not a trivial task, and increases in complexity as more and more generations are incorporated. In this article, we present TreeSpark, an open source tool to carry out progeny analysis and provides functionality that allows simple access to the information of the individuals and their relations both as progenitors and descendants. This tool is developed as a Python module, which in turn inherits the distributed processing features of Spark, allowing it to process large volumes of progeny information. TreeSpark is compared with other similar tools, finding TreeSpark much simpler to use.
Workshop: WBDMD - Base de Datos y Minería de Datos
Red de Universidades con Carreras en Informática
Materia
Ciencias Informáticas
Spark
Big data
Progeny analysis
Genealogy
Analytics
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/130340

id SEDICI_ddd658f6baa4e93ac3861391555aa800
oai_identifier_str oai:sedici.unlp.edu.ar:10915/130340
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling TreeSpark: A Distributed Tool for Progeny Analysis based on SparkLópez, PaulaHasperué, WaldoQuiroga, Facundo ManuelRonchetti, FrancoCiencias InformáticasSparkBig dataProgeny analysisGenealogyAnalyticsProgeny analyses are useful in biological sciences for various purposes, such as improving individuals in new generations or carrying out molecular analysis of the transmission of genetic characteristics. Analyzing these data by making comparisons between individuals of a generation with their offspring is not a trivial task, and increases in complexity as more and more generations are incorporated. In this article, we present TreeSpark, an open source tool to carry out progeny analysis and provides functionality that allows simple access to the information of the individuals and their relations both as progenitors and descendants. This tool is developed as a Python module, which in turn inherits the distributed processing features of Spark, allowing it to process large volumes of progeny information. TreeSpark is compared with other similar tools, finding TreeSpark much simpler to use.Workshop: WBDMD - Base de Datos y Minería de DatosRed de Universidades con Carreras en Informática2021-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf251-260http://sedici.unlp.edu.ar/handle/10915/130340enginfo:eu-repo/semantics/altIdentifier/isbn/978-987-633-574-4info:eu-repo/semantics/reference/hdl/10915/129809info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-11-12T10:57:02Zoai:sedici.unlp.edu.ar:10915/130340Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-11-12 10:57:02.977SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv TreeSpark: A Distributed Tool for Progeny Analysis based on Spark
title TreeSpark: A Distributed Tool for Progeny Analysis based on Spark
spellingShingle TreeSpark: A Distributed Tool for Progeny Analysis based on Spark
López, Paula
Ciencias Informáticas
Spark
Big data
Progeny analysis
Genealogy
Analytics
title_short TreeSpark: A Distributed Tool for Progeny Analysis based on Spark
title_full TreeSpark: A Distributed Tool for Progeny Analysis based on Spark
title_fullStr TreeSpark: A Distributed Tool for Progeny Analysis based on Spark
title_full_unstemmed TreeSpark: A Distributed Tool for Progeny Analysis based on Spark
title_sort TreeSpark: A Distributed Tool for Progeny Analysis based on Spark
dc.creator.none.fl_str_mv López, Paula
Hasperué, Waldo
Quiroga, Facundo Manuel
Ronchetti, Franco
author López, Paula
author_facet López, Paula
Hasperué, Waldo
Quiroga, Facundo Manuel
Ronchetti, Franco
author_role author
author2 Hasperué, Waldo
Quiroga, Facundo Manuel
Ronchetti, Franco
author2_role author
author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
Spark
Big data
Progeny analysis
Genealogy
Analytics
topic Ciencias Informáticas
Spark
Big data
Progeny analysis
Genealogy
Analytics
dc.description.none.fl_txt_mv Progeny analyses are useful in biological sciences for various purposes, such as improving individuals in new generations or carrying out molecular analysis of the transmission of genetic characteristics. Analyzing these data by making comparisons between individuals of a generation with their offspring is not a trivial task, and increases in complexity as more and more generations are incorporated. In this article, we present TreeSpark, an open source tool to carry out progeny analysis and provides functionality that allows simple access to the information of the individuals and their relations both as progenitors and descendants. This tool is developed as a Python module, which in turn inherits the distributed processing features of Spark, allowing it to process large volumes of progeny information. TreeSpark is compared with other similar tools, finding TreeSpark much simpler to use.
Workshop: WBDMD - Base de Datos y Minería de Datos
Red de Universidades con Carreras en Informática
description Progeny analyses are useful in biological sciences for various purposes, such as improving individuals in new generations or carrying out molecular analysis of the transmission of genetic characteristics. Analyzing these data by making comparisons between individuals of a generation with their offspring is not a trivial task, and increases in complexity as more and more generations are incorporated. In this article, we present TreeSpark, an open source tool to carry out progeny analysis and provides functionality that allows simple access to the information of the individuals and their relations both as progenitors and descendants. This tool is developed as a Python module, which in turn inherits the distributed processing features of Spark, allowing it to process large volumes of progeny information. TreeSpark is compared with other similar tools, finding TreeSpark much simpler to use.
publishDate 2021
dc.date.none.fl_str_mv 2021-10
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/130340
url http://sedici.unlp.edu.ar/handle/10915/130340
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/isbn/978-987-633-574-4
info:eu-repo/semantics/reference/hdl/10915/129809
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv application/pdf
251-260
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1848605678774517760
score 13.24909