The importance of digitized biocollections as a source of trait data and a new VertNet resource

Autores
Guralnick, Robert P.; Zermoglio, Paula Florencia; Wieczorek, John; LaFrance, Raphael; Bloom, David; Russell, Laura
Año de publicación
2016
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
For vast areas of the globe and large parts of the tree of life, data needed to inform trait diversity is incomplete. Such trait data, when fully assembled, however, form the link between the evolutionary history of organisms, their assembly into communities, and the nature and functioning of ecosystems. Recent efforts to close data gaps have focused on collating trait-by-species databases, which only provide species-level, aggregated value ranges for traits of interest and often lack the direct observations on which those ranges are based. Perhaps under-appreciated is that digitized biocollection records collectively contain a vast trove of trait data measured directly from individuals, but this content remains hidden and highly heterogeneous, impeding discoverability and use. We developed and deployed a suite of openly accessible software tools in order to collate a full set of trait descriptions and extract two key traits, body length and mass, from >18 million specimen records in VertNet, a global biodiversity data publisher and aggregator. We tested success rate of these tools against hand-checked validation data sets and characterized quality and quantity. A post-processing toolkit was developed to standardize and harmonize data sets, and to integrate this improved content into VertNet for broadest reuse. The result of this work was to add more than 1.5 million harmonized measurements on vertebrate body mass and length directly to specimen records. Rates of false positives and negatives for extracted data were extremely low. We also created new tools for filtering, querying, and assembling this research-ready vertebrate trait content for view and download. Our work has yielded a novel database and platform for harmonized trait content that will grow as tools introduced here become part of publication workflows. We close by noting how this effort extends to new communities already developing similar digitized content.
Fil: Guralnick, Robert P.. University of Florida; Estados Unidos
Fil: Zermoglio, Paula Florencia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Ecología, Genética y Evolución de Buenos Aires. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Ecología, Genética y Evolución de Buenos Aires; Argentina. Universite Francois Rabelais; Francia
Fil: Wieczorek, John. University of California at Berkeley; Estados Unidos
Fil: LaFrance, Raphael. University of Florida; Estados Unidos
Fil: Bloom, David. University of Florida; Estados Unidos
Fil: Russell, Laura. University of Florida; Estados Unidos. University of Kansas; Estados Unidos
Materia
Biodiversity informatics
Body mass
Body length
Data mining
Digitization
Darwin Core
Natural history collections
Semantics
Standards
Trait data
VertNet
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/61959

id CONICETDig_5b2a967f6d20570a8c5f95c9a4cd94ce
oai_identifier_str oai:ri.conicet.gov.ar:11336/61959
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling The importance of digitized biocollections as a source of trait data and a new VertNet resourceGuralnick, Robert P.Zermoglio, Paula FlorenciaWieczorek, JohnLaFrance, RaphaelBloom, DavidRussell, LauraBiodiversity informaticsBody massBody lengthData miningDigitizationDarwin CoreNatural history collectionsSemanticsStandardsTrait dataVertNethttps://purl.org/becyt/ford/1.6https://purl.org/becyt/ford/1https://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1For vast areas of the globe and large parts of the tree of life, data needed to inform trait diversity is incomplete. Such trait data, when fully assembled, however, form the link between the evolutionary history of organisms, their assembly into communities, and the nature and functioning of ecosystems. Recent efforts to close data gaps have focused on collating trait-by-species databases, which only provide species-level, aggregated value ranges for traits of interest and often lack the direct observations on which those ranges are based. Perhaps under-appreciated is that digitized biocollection records collectively contain a vast trove of trait data measured directly from individuals, but this content remains hidden and highly heterogeneous, impeding discoverability and use. We developed and deployed a suite of openly accessible software tools in order to collate a full set of trait descriptions and extract two key traits, body length and mass, from >18 million specimen records in VertNet, a global biodiversity data publisher and aggregator. We tested success rate of these tools against hand-checked validation data sets and characterized quality and quantity. A post-processing toolkit was developed to standardize and harmonize data sets, and to integrate this improved content into VertNet for broadest reuse. The result of this work was to add more than 1.5 million harmonized measurements on vertebrate body mass and length directly to specimen records. Rates of false positives and negatives for extracted data were extremely low. We also created new tools for filtering, querying, and assembling this research-ready vertebrate trait content for view and download. Our work has yielded a novel database and platform for harmonized trait content that will grow as tools introduced here become part of publication workflows. We close by noting how this effort extends to new communities already developing similar digitized content.Fil: Guralnick, Robert P.. University of Florida; Estados UnidosFil: Zermoglio, Paula Florencia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Ecología, Genética y Evolución de Buenos Aires. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Ecología, Genética y Evolución de Buenos Aires; Argentina. Universite Francois Rabelais; FranciaFil: Wieczorek, John. University of California at Berkeley; Estados UnidosFil: LaFrance, Raphael. University of Florida; Estados UnidosFil: Bloom, David. University of Florida; Estados UnidosFil: Russell, Laura. University of Florida; Estados Unidos. University of Kansas; Estados UnidosOxford University Press2016-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/61959Guralnick, Robert P.; Zermoglio, Paula Florencia; Wieczorek, John; LaFrance, Raphael; Bloom, David; et al.; The importance of digitized biocollections as a source of trait data and a new VertNet resource; Oxford University Press; Database; 2016; 1-2016; 1-131758-0463CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1093/database/baw158info:eu-repo/semantics/altIdentifier/url/https://academic.oup.com/database/article/doi/10.1093/database/baw158/2742077info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:19:03Zoai:ri.conicet.gov.ar:11336/61959instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:19:03.423CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv The importance of digitized biocollections as a source of trait data and a new VertNet resource
title The importance of digitized biocollections as a source of trait data and a new VertNet resource
spellingShingle The importance of digitized biocollections as a source of trait data and a new VertNet resource
Guralnick, Robert P.
Biodiversity informatics
Body mass
Body length
Data mining
Digitization
Darwin Core
Natural history collections
Semantics
Standards
Trait data
VertNet
title_short The importance of digitized biocollections as a source of trait data and a new VertNet resource
title_full The importance of digitized biocollections as a source of trait data and a new VertNet resource
title_fullStr The importance of digitized biocollections as a source of trait data and a new VertNet resource
title_full_unstemmed The importance of digitized biocollections as a source of trait data and a new VertNet resource
title_sort The importance of digitized biocollections as a source of trait data and a new VertNet resource
dc.creator.none.fl_str_mv Guralnick, Robert P.
Zermoglio, Paula Florencia
Wieczorek, John
LaFrance, Raphael
Bloom, David
Russell, Laura
author Guralnick, Robert P.
author_facet Guralnick, Robert P.
Zermoglio, Paula Florencia
Wieczorek, John
LaFrance, Raphael
Bloom, David
Russell, Laura
author_role author
author2 Zermoglio, Paula Florencia
Wieczorek, John
LaFrance, Raphael
Bloom, David
Russell, Laura
author2_role author
author
author
author
author
dc.subject.none.fl_str_mv Biodiversity informatics
Body mass
Body length
Data mining
Digitization
Darwin Core
Natural history collections
Semantics
Standards
Trait data
VertNet
topic Biodiversity informatics
Body mass
Body length
Data mining
Digitization
Darwin Core
Natural history collections
Semantics
Standards
Trait data
VertNet
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.6
https://purl.org/becyt/ford/1
https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv For vast areas of the globe and large parts of the tree of life, data needed to inform trait diversity is incomplete. Such trait data, when fully assembled, however, form the link between the evolutionary history of organisms, their assembly into communities, and the nature and functioning of ecosystems. Recent efforts to close data gaps have focused on collating trait-by-species databases, which only provide species-level, aggregated value ranges for traits of interest and often lack the direct observations on which those ranges are based. Perhaps under-appreciated is that digitized biocollection records collectively contain a vast trove of trait data measured directly from individuals, but this content remains hidden and highly heterogeneous, impeding discoverability and use. We developed and deployed a suite of openly accessible software tools in order to collate a full set of trait descriptions and extract two key traits, body length and mass, from >18 million specimen records in VertNet, a global biodiversity data publisher and aggregator. We tested success rate of these tools against hand-checked validation data sets and characterized quality and quantity. A post-processing toolkit was developed to standardize and harmonize data sets, and to integrate this improved content into VertNet for broadest reuse. The result of this work was to add more than 1.5 million harmonized measurements on vertebrate body mass and length directly to specimen records. Rates of false positives and negatives for extracted data were extremely low. We also created new tools for filtering, querying, and assembling this research-ready vertebrate trait content for view and download. Our work has yielded a novel database and platform for harmonized trait content that will grow as tools introduced here become part of publication workflows. We close by noting how this effort extends to new communities already developing similar digitized content.
Fil: Guralnick, Robert P.. University of Florida; Estados Unidos
Fil: Zermoglio, Paula Florencia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Ecología, Genética y Evolución de Buenos Aires. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Ecología, Genética y Evolución de Buenos Aires; Argentina. Universite Francois Rabelais; Francia
Fil: Wieczorek, John. University of California at Berkeley; Estados Unidos
Fil: LaFrance, Raphael. University of Florida; Estados Unidos
Fil: Bloom, David. University of Florida; Estados Unidos
Fil: Russell, Laura. University of Florida; Estados Unidos. University of Kansas; Estados Unidos
description For vast areas of the globe and large parts of the tree of life, data needed to inform trait diversity is incomplete. Such trait data, when fully assembled, however, form the link between the evolutionary history of organisms, their assembly into communities, and the nature and functioning of ecosystems. Recent efforts to close data gaps have focused on collating trait-by-species databases, which only provide species-level, aggregated value ranges for traits of interest and often lack the direct observations on which those ranges are based. Perhaps under-appreciated is that digitized biocollection records collectively contain a vast trove of trait data measured directly from individuals, but this content remains hidden and highly heterogeneous, impeding discoverability and use. We developed and deployed a suite of openly accessible software tools in order to collate a full set of trait descriptions and extract two key traits, body length and mass, from >18 million specimen records in VertNet, a global biodiversity data publisher and aggregator. We tested success rate of these tools against hand-checked validation data sets and characterized quality and quantity. A post-processing toolkit was developed to standardize and harmonize data sets, and to integrate this improved content into VertNet for broadest reuse. The result of this work was to add more than 1.5 million harmonized measurements on vertebrate body mass and length directly to specimen records. Rates of false positives and negatives for extracted data were extremely low. We also created new tools for filtering, querying, and assembling this research-ready vertebrate trait content for view and download. Our work has yielded a novel database and platform for harmonized trait content that will grow as tools introduced here become part of publication workflows. We close by noting how this effort extends to new communities already developing similar digitized content.
publishDate 2016
dc.date.none.fl_str_mv 2016-01
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/61959
Guralnick, Robert P.; Zermoglio, Paula Florencia; Wieczorek, John; LaFrance, Raphael; Bloom, David; et al.; The importance of digitized biocollections as a source of trait data and a new VertNet resource; Oxford University Press; Database; 2016; 1-2016; 1-13
1758-0463
CONICET Digital
CONICET
url http://hdl.handle.net/11336/61959
identifier_str_mv Guralnick, Robert P.; Zermoglio, Paula Florencia; Wieczorek, John; LaFrance, Raphael; Bloom, David; et al.; The importance of digitized biocollections as a source of trait data and a new VertNet resource; Oxford University Press; Database; 2016; 1-2016; 1-13
1758-0463
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/10.1093/database/baw158
info:eu-repo/semantics/altIdentifier/url/https://academic.oup.com/database/article/doi/10.1093/database/baw158/2742077
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Oxford University Press
publisher.none.fl_str_mv Oxford University Press
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844614158613479424
score 13.070432