Pooch: A friend to fetch your data files
- Autores
- Uieda, Leonardo; Soler, Santiago Rubén; Rampin, Rémi; Kemenade, Hugo van; Turk, Matthew; Shapero, Daniel; Banihirwe, Anderson; Leeman, John
- Año de publicación
- 2020
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Scientific software is usually created to acquire, analyze, model, and visualize data. As such, many software libraries include sample datasets in their distributions for use in documentation, tests, benchmarks, and workshops. A common approach is to include smaller datasets in the GitHub repository directly and package them with the source and binary distributions (e.g., scikit-learn (Pedregosa et al., 2011) and scikit-image (Van der Walt et al., 2014) do this). As data files increase in size, it becomes unfeasible to store them in GitHub repositories. Thus, larger datasets require writing code to download the files from a remote server to the user’s computer. The same problem is faced by scientists using version control to manage their research projects. While downloading a data file over HTTPS can be done easily with modern Python libraries, it is not trivial to manage a set of files, keep them updated, and check for corruption. For example, scikit-learn (Pedregosa et al., 2011), Cartopy (Met Office, n.d.), and PyVista (Sullivan & Kaszynski, 2019) all include code dedicated to this particular task. Instead of scientists and library authors recreating the same code, it would be best to have a minimalistic and easy to set up tool for fetching and maintaining data files.
Fil: Uieda, Leonardo. University of Liverpool; Reino Unido
Fil: Soler, Santiago Rubén. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Juan; Argentina. Universidad Nacional de San Juan. Facultad de Ciencias Exactas, Físicas y Naturales. Instituto Geofísico Sismológico Volponi; Argentina
Fil: Rampin, Rémi. University of New York; Estados Unidos
Fil: Kemenade, Hugo van. No especifíca;
Fil: Turk, Matthew. University of Illinois. Urbana - Champaign; Estados Unidos
Fil: Shapero, Daniel. University of Washington; Estados Unidos
Fil: Banihirwe, Anderson. National Center for Atmospheric Research; Estados Unidos
Fil: Leeman, John. Leeman Geophysical; Estados Unidos - Materia
-
OPEN SOURCE
PYTHON
DATA
JOSS - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/156774
Ver los metadatos del registro completo
id |
CONICETDig_0906ba4e143b6e79ba0a92c037427b37 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/156774 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Pooch: A friend to fetch your data filesUieda, LeonardoSoler, Santiago RubénRampin, RémiKemenade, Hugo vanTurk, MatthewShapero, DanielBanihirwe, AndersonLeeman, JohnOPEN SOURCEPYTHONDATAJOSShttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Scientific software is usually created to acquire, analyze, model, and visualize data. As such, many software libraries include sample datasets in their distributions for use in documentation, tests, benchmarks, and workshops. A common approach is to include smaller datasets in the GitHub repository directly and package them with the source and binary distributions (e.g., scikit-learn (Pedregosa et al., 2011) and scikit-image (Van der Walt et al., 2014) do this). As data files increase in size, it becomes unfeasible to store them in GitHub repositories. Thus, larger datasets require writing code to download the files from a remote server to the user’s computer. The same problem is faced by scientists using version control to manage their research projects. While downloading a data file over HTTPS can be done easily with modern Python libraries, it is not trivial to manage a set of files, keep them updated, and check for corruption. For example, scikit-learn (Pedregosa et al., 2011), Cartopy (Met Office, n.d.), and PyVista (Sullivan & Kaszynski, 2019) all include code dedicated to this particular task. Instead of scientists and library authors recreating the same code, it would be best to have a minimalistic and easy to set up tool for fetching and maintaining data files.Fil: Uieda, Leonardo. University of Liverpool; Reino UnidoFil: Soler, Santiago Rubén. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Juan; Argentina. Universidad Nacional de San Juan. Facultad de Ciencias Exactas, Físicas y Naturales. Instituto Geofísico Sismológico Volponi; ArgentinaFil: Rampin, Rémi. University of New York; Estados UnidosFil: Kemenade, Hugo van. No especifíca;Fil: Turk, Matthew. University of Illinois. Urbana - Champaign; Estados UnidosFil: Shapero, Daniel. University of Washington; Estados UnidosFil: Banihirwe, Anderson. National Center for Atmospheric Research; Estados UnidosFil: Leeman, John. Leeman Geophysical; Estados UnidosJournal of Open Source Software2020-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/156774Uieda, Leonardo; Soler, Santiago Rubén; Rampin, Rémi; Kemenade, Hugo van; Turk, Matthew; et al.; Pooch: A friend to fetch your data files; Journal of Open Source Software; Journal of Open Source Software; 5; 45; 1-2020; 1-32475-9066CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://joss.theoj.org/papers/10.21105/joss.01943info:eu-repo/semantics/altIdentifier/doi/10.21105/joss.01943info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:36:19Zoai:ri.conicet.gov.ar:11336/156774instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:36:19.669CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Pooch: A friend to fetch your data files |
title |
Pooch: A friend to fetch your data files |
spellingShingle |
Pooch: A friend to fetch your data files Uieda, Leonardo OPEN SOURCE PYTHON DATA JOSS |
title_short |
Pooch: A friend to fetch your data files |
title_full |
Pooch: A friend to fetch your data files |
title_fullStr |
Pooch: A friend to fetch your data files |
title_full_unstemmed |
Pooch: A friend to fetch your data files |
title_sort |
Pooch: A friend to fetch your data files |
dc.creator.none.fl_str_mv |
Uieda, Leonardo Soler, Santiago Rubén Rampin, Rémi Kemenade, Hugo van Turk, Matthew Shapero, Daniel Banihirwe, Anderson Leeman, John |
author |
Uieda, Leonardo |
author_facet |
Uieda, Leonardo Soler, Santiago Rubén Rampin, Rémi Kemenade, Hugo van Turk, Matthew Shapero, Daniel Banihirwe, Anderson Leeman, John |
author_role |
author |
author2 |
Soler, Santiago Rubén Rampin, Rémi Kemenade, Hugo van Turk, Matthew Shapero, Daniel Banihirwe, Anderson Leeman, John |
author2_role |
author author author author author author author |
dc.subject.none.fl_str_mv |
OPEN SOURCE PYTHON DATA JOSS |
topic |
OPEN SOURCE PYTHON DATA JOSS |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
Scientific software is usually created to acquire, analyze, model, and visualize data. As such, many software libraries include sample datasets in their distributions for use in documentation, tests, benchmarks, and workshops. A common approach is to include smaller datasets in the GitHub repository directly and package them with the source and binary distributions (e.g., scikit-learn (Pedregosa et al., 2011) and scikit-image (Van der Walt et al., 2014) do this). As data files increase in size, it becomes unfeasible to store them in GitHub repositories. Thus, larger datasets require writing code to download the files from a remote server to the user’s computer. The same problem is faced by scientists using version control to manage their research projects. While downloading a data file over HTTPS can be done easily with modern Python libraries, it is not trivial to manage a set of files, keep them updated, and check for corruption. For example, scikit-learn (Pedregosa et al., 2011), Cartopy (Met Office, n.d.), and PyVista (Sullivan & Kaszynski, 2019) all include code dedicated to this particular task. Instead of scientists and library authors recreating the same code, it would be best to have a minimalistic and easy to set up tool for fetching and maintaining data files. Fil: Uieda, Leonardo. University of Liverpool; Reino Unido Fil: Soler, Santiago Rubén. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Juan; Argentina. Universidad Nacional de San Juan. Facultad de Ciencias Exactas, Físicas y Naturales. Instituto Geofísico Sismológico Volponi; Argentina Fil: Rampin, Rémi. University of New York; Estados Unidos Fil: Kemenade, Hugo van. No especifíca; Fil: Turk, Matthew. University of Illinois. Urbana - Champaign; Estados Unidos Fil: Shapero, Daniel. University of Washington; Estados Unidos Fil: Banihirwe, Anderson. National Center for Atmospheric Research; Estados Unidos Fil: Leeman, John. Leeman Geophysical; Estados Unidos |
description |
Scientific software is usually created to acquire, analyze, model, and visualize data. As such, many software libraries include sample datasets in their distributions for use in documentation, tests, benchmarks, and workshops. A common approach is to include smaller datasets in the GitHub repository directly and package them with the source and binary distributions (e.g., scikit-learn (Pedregosa et al., 2011) and scikit-image (Van der Walt et al., 2014) do this). As data files increase in size, it becomes unfeasible to store them in GitHub repositories. Thus, larger datasets require writing code to download the files from a remote server to the user’s computer. The same problem is faced by scientists using version control to manage their research projects. While downloading a data file over HTTPS can be done easily with modern Python libraries, it is not trivial to manage a set of files, keep them updated, and check for corruption. For example, scikit-learn (Pedregosa et al., 2011), Cartopy (Met Office, n.d.), and PyVista (Sullivan & Kaszynski, 2019) all include code dedicated to this particular task. Instead of scientists and library authors recreating the same code, it would be best to have a minimalistic and easy to set up tool for fetching and maintaining data files. |
publishDate |
2020 |
dc.date.none.fl_str_mv |
2020-01 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/156774 Uieda, Leonardo; Soler, Santiago Rubén; Rampin, Rémi; Kemenade, Hugo van; Turk, Matthew; et al.; Pooch: A friend to fetch your data files; Journal of Open Source Software; Journal of Open Source Software; 5; 45; 1-2020; 1-3 2475-9066 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/156774 |
identifier_str_mv |
Uieda, Leonardo; Soler, Santiago Rubén; Rampin, Rémi; Kemenade, Hugo van; Turk, Matthew; et al.; Pooch: A friend to fetch your data files; Journal of Open Source Software; Journal of Open Source Software; 5; 45; 1-2020; 1-3 2475-9066 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://joss.theoj.org/papers/10.21105/joss.01943 info:eu-repo/semantics/altIdentifier/doi/10.21105/joss.01943 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Journal of Open Source Software |
publisher.none.fl_str_mv |
Journal of Open Source Software |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844614383373647872 |
score |
13.070432 |