The importance of digitized biocollections as a source of trait data and a new VertNet resource
- Autores
- Guralnick, Robert P.; Zermoglio, Paula Florencia; Wieczorek, John; LaFrance, Raphael; Bloom, David; Russell, Laura
- Año de publicación
- 2016
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- For vast areas of the globe and large parts of the tree of life, data needed to inform trait diversity is incomplete. Such trait data, when fully assembled, however, form the link between the evolutionary history of organisms, their assembly into communities, and the nature and functioning of ecosystems. Recent efforts to close data gaps have focused on collating trait-by-species databases, which only provide species-level, aggregated value ranges for traits of interest and often lack the direct observations on which those ranges are based. Perhaps under-appreciated is that digitized biocollection records collectively contain a vast trove of trait data measured directly from individuals, but this content remains hidden and highly heterogeneous, impeding discoverability and use. We developed and deployed a suite of openly accessible software tools in order to collate a full set of trait descriptions and extract two key traits, body length and mass, from >18 million specimen records in VertNet, a global biodiversity data publisher and aggregator. We tested success rate of these tools against hand-checked validation data sets and characterized quality and quantity. A post-processing toolkit was developed to standardize and harmonize data sets, and to integrate this improved content into VertNet for broadest reuse. The result of this work was to add more than 1.5 million harmonized measurements on vertebrate body mass and length directly to specimen records. Rates of false positives and negatives for extracted data were extremely low. We also created new tools for filtering, querying, and assembling this research-ready vertebrate trait content for view and download. Our work has yielded a novel database and platform for harmonized trait content that will grow as tools introduced here become part of publication workflows. We close by noting how this effort extends to new communities already developing similar digitized content.
Fil: Guralnick, Robert P.. University of Florida; Estados Unidos
Fil: Zermoglio, Paula Florencia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Ecología, Genética y Evolución de Buenos Aires. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Ecología, Genética y Evolución de Buenos Aires; Argentina. Universite Francois Rabelais; Francia
Fil: Wieczorek, John. University of California at Berkeley; Estados Unidos
Fil: LaFrance, Raphael. University of Florida; Estados Unidos
Fil: Bloom, David. University of Florida; Estados Unidos
Fil: Russell, Laura. University of Florida; Estados Unidos. University of Kansas; Estados Unidos - Materia
-
Biodiversity informatics
Body mass
Body length
Data mining
Digitization
Darwin Core
Natural history collections
Semantics
Standards
Trait data
VertNet - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/61959
Ver los metadatos del registro completo
id |
CONICETDig_5b2a967f6d20570a8c5f95c9a4cd94ce |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/61959 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
The importance of digitized biocollections as a source of trait data and a new VertNet resourceGuralnick, Robert P.Zermoglio, Paula FlorenciaWieczorek, JohnLaFrance, RaphaelBloom, DavidRussell, LauraBiodiversity informaticsBody massBody lengthData miningDigitizationDarwin CoreNatural history collectionsSemanticsStandardsTrait dataVertNethttps://purl.org/becyt/ford/1.6https://purl.org/becyt/ford/1https://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1For vast areas of the globe and large parts of the tree of life, data needed to inform trait diversity is incomplete. Such trait data, when fully assembled, however, form the link between the evolutionary history of organisms, their assembly into communities, and the nature and functioning of ecosystems. Recent efforts to close data gaps have focused on collating trait-by-species databases, which only provide species-level, aggregated value ranges for traits of interest and often lack the direct observations on which those ranges are based. Perhaps under-appreciated is that digitized biocollection records collectively contain a vast trove of trait data measured directly from individuals, but this content remains hidden and highly heterogeneous, impeding discoverability and use. We developed and deployed a suite of openly accessible software tools in order to collate a full set of trait descriptions and extract two key traits, body length and mass, from >18 million specimen records in VertNet, a global biodiversity data publisher and aggregator. We tested success rate of these tools against hand-checked validation data sets and characterized quality and quantity. A post-processing toolkit was developed to standardize and harmonize data sets, and to integrate this improved content into VertNet for broadest reuse. The result of this work was to add more than 1.5 million harmonized measurements on vertebrate body mass and length directly to specimen records. Rates of false positives and negatives for extracted data were extremely low. We also created new tools for filtering, querying, and assembling this research-ready vertebrate trait content for view and download. Our work has yielded a novel database and platform for harmonized trait content that will grow as tools introduced here become part of publication workflows. We close by noting how this effort extends to new communities already developing similar digitized content.Fil: Guralnick, Robert P.. University of Florida; Estados UnidosFil: Zermoglio, Paula Florencia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Ecología, Genética y Evolución de Buenos Aires. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Ecología, Genética y Evolución de Buenos Aires; Argentina. Universite Francois Rabelais; FranciaFil: Wieczorek, John. University of California at Berkeley; Estados UnidosFil: LaFrance, Raphael. University of Florida; Estados UnidosFil: Bloom, David. University of Florida; Estados UnidosFil: Russell, Laura. University of Florida; Estados Unidos. University of Kansas; Estados UnidosOxford University Press2016-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/61959Guralnick, Robert P.; Zermoglio, Paula Florencia; Wieczorek, John; LaFrance, Raphael; Bloom, David; et al.; The importance of digitized biocollections as a source of trait data and a new VertNet resource; Oxford University Press; Database; 2016; 1-2016; 1-131758-0463CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1093/database/baw158info:eu-repo/semantics/altIdentifier/url/https://academic.oup.com/database/article/doi/10.1093/database/baw158/2742077info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:19:03Zoai:ri.conicet.gov.ar:11336/61959instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:19:03.423CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
The importance of digitized biocollections as a source of trait data and a new VertNet resource |
title |
The importance of digitized biocollections as a source of trait data and a new VertNet resource |
spellingShingle |
The importance of digitized biocollections as a source of trait data and a new VertNet resource Guralnick, Robert P. Biodiversity informatics Body mass Body length Data mining Digitization Darwin Core Natural history collections Semantics Standards Trait data VertNet |
title_short |
The importance of digitized biocollections as a source of trait data and a new VertNet resource |
title_full |
The importance of digitized biocollections as a source of trait data and a new VertNet resource |
title_fullStr |
The importance of digitized biocollections as a source of trait data and a new VertNet resource |
title_full_unstemmed |
The importance of digitized biocollections as a source of trait data and a new VertNet resource |
title_sort |
The importance of digitized biocollections as a source of trait data and a new VertNet resource |
dc.creator.none.fl_str_mv |
Guralnick, Robert P. Zermoglio, Paula Florencia Wieczorek, John LaFrance, Raphael Bloom, David Russell, Laura |
author |
Guralnick, Robert P. |
author_facet |
Guralnick, Robert P. Zermoglio, Paula Florencia Wieczorek, John LaFrance, Raphael Bloom, David Russell, Laura |
author_role |
author |
author2 |
Zermoglio, Paula Florencia Wieczorek, John LaFrance, Raphael Bloom, David Russell, Laura |
author2_role |
author author author author author |
dc.subject.none.fl_str_mv |
Biodiversity informatics Body mass Body length Data mining Digitization Darwin Core Natural history collections Semantics Standards Trait data VertNet |
topic |
Biodiversity informatics Body mass Body length Data mining Digitization Darwin Core Natural history collections Semantics Standards Trait data VertNet |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.6 https://purl.org/becyt/ford/1 https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
For vast areas of the globe and large parts of the tree of life, data needed to inform trait diversity is incomplete. Such trait data, when fully assembled, however, form the link between the evolutionary history of organisms, their assembly into communities, and the nature and functioning of ecosystems. Recent efforts to close data gaps have focused on collating trait-by-species databases, which only provide species-level, aggregated value ranges for traits of interest and often lack the direct observations on which those ranges are based. Perhaps under-appreciated is that digitized biocollection records collectively contain a vast trove of trait data measured directly from individuals, but this content remains hidden and highly heterogeneous, impeding discoverability and use. We developed and deployed a suite of openly accessible software tools in order to collate a full set of trait descriptions and extract two key traits, body length and mass, from >18 million specimen records in VertNet, a global biodiversity data publisher and aggregator. We tested success rate of these tools against hand-checked validation data sets and characterized quality and quantity. A post-processing toolkit was developed to standardize and harmonize data sets, and to integrate this improved content into VertNet for broadest reuse. The result of this work was to add more than 1.5 million harmonized measurements on vertebrate body mass and length directly to specimen records. Rates of false positives and negatives for extracted data were extremely low. We also created new tools for filtering, querying, and assembling this research-ready vertebrate trait content for view and download. Our work has yielded a novel database and platform for harmonized trait content that will grow as tools introduced here become part of publication workflows. We close by noting how this effort extends to new communities already developing similar digitized content. Fil: Guralnick, Robert P.. University of Florida; Estados Unidos Fil: Zermoglio, Paula Florencia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Ecología, Genética y Evolución de Buenos Aires. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Ecología, Genética y Evolución de Buenos Aires; Argentina. Universite Francois Rabelais; Francia Fil: Wieczorek, John. University of California at Berkeley; Estados Unidos Fil: LaFrance, Raphael. University of Florida; Estados Unidos Fil: Bloom, David. University of Florida; Estados Unidos Fil: Russell, Laura. University of Florida; Estados Unidos. University of Kansas; Estados Unidos |
description |
For vast areas of the globe and large parts of the tree of life, data needed to inform trait diversity is incomplete. Such trait data, when fully assembled, however, form the link between the evolutionary history of organisms, their assembly into communities, and the nature and functioning of ecosystems. Recent efforts to close data gaps have focused on collating trait-by-species databases, which only provide species-level, aggregated value ranges for traits of interest and often lack the direct observations on which those ranges are based. Perhaps under-appreciated is that digitized biocollection records collectively contain a vast trove of trait data measured directly from individuals, but this content remains hidden and highly heterogeneous, impeding discoverability and use. We developed and deployed a suite of openly accessible software tools in order to collate a full set of trait descriptions and extract two key traits, body length and mass, from >18 million specimen records in VertNet, a global biodiversity data publisher and aggregator. We tested success rate of these tools against hand-checked validation data sets and characterized quality and quantity. A post-processing toolkit was developed to standardize and harmonize data sets, and to integrate this improved content into VertNet for broadest reuse. The result of this work was to add more than 1.5 million harmonized measurements on vertebrate body mass and length directly to specimen records. Rates of false positives and negatives for extracted data were extremely low. We also created new tools for filtering, querying, and assembling this research-ready vertebrate trait content for view and download. Our work has yielded a novel database and platform for harmonized trait content that will grow as tools introduced here become part of publication workflows. We close by noting how this effort extends to new communities already developing similar digitized content. |
publishDate |
2016 |
dc.date.none.fl_str_mv |
2016-01 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/61959 Guralnick, Robert P.; Zermoglio, Paula Florencia; Wieczorek, John; LaFrance, Raphael; Bloom, David; et al.; The importance of digitized biocollections as a source of trait data and a new VertNet resource; Oxford University Press; Database; 2016; 1-2016; 1-13 1758-0463 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/61959 |
identifier_str_mv |
Guralnick, Robert P.; Zermoglio, Paula Florencia; Wieczorek, John; LaFrance, Raphael; Bloom, David; et al.; The importance of digitized biocollections as a source of trait data and a new VertNet resource; Oxford University Press; Database; 2016; 1-2016; 1-13 1758-0463 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/doi/10.1093/database/baw158 info:eu-repo/semantics/altIdentifier/url/https://academic.oup.com/database/article/doi/10.1093/database/baw158/2742077 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Oxford University Press |
publisher.none.fl_str_mv |
Oxford University Press |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844614158613479424 |
score |
13.070432 |