Preserving accuracy in GenBank
- Autores
- Bidartondo, Martin I.; Bruns, Thomas D.; Blackwell, Meredith; Edwards, Ivan; Taylor, Andy F. S.; Bianchinotti, Maria Virginia; Padamsee, Mahajabeen; Callac, Philippe; Lima, Nelson; White, Merlin M.; Barreau Daly, Camila; Juncai, M. A.; Buyck, Bart; Rabeler, Richard K.; Liles, Mark R.; Estes, Dwayne; Carter, Richard; Herr Jr., J. M.; Chandler, Gregory; Kerekes, Jennifer; Cruse Sanders, Jennifer; Galán Marquez, R.; Horak, Egon; Fitzsimons, Michael; Döering, Heidi; Yao, Su; Hynson, Nicole; Ryberg, Martin; Arnold, A. E.; Hughes, Karen
- Año de publicación
- 2008
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- GenBank, the public repository for nucleotide and protein sequences, is a critical resource for molecular biology, evolutionary biology, and ecology. While some attention has been drawn to sequence errors, common annotation errors also reduce the value of this database. In fact, for organisms such as fungi, which are notoriously difficult to identify, up to 20% of DNA sequence records may have erroneous lineage designations in GenBank. Gene function annotation in protein sequence databases is similarly error-prone. Because identity and function of new sequences are often determined by bioinformatic analyses, both types of errors are propagated into new accessions, leading to long-term degradation of the quality of the database. Currently, primary sequence data are annotated by the authors of those data, and can only be reannotated by the same authors. This is inefficient and unsustainable over the long term as authors eventually leave the field. Although it is possible to link third-party databases to GenBank records, this is a short-term solution that has little guarantee of permanence. Similarly, the current third-party annotation option in GenBank (TPA) complicates rather than solves the problem by creating an identical record with a new annotation, while leaving the original record unflagged and unlinked to the new record. Since the origin of public zoological and botanical specimen collections, an open system of cumulative annotation has evolved, whereby the original name is retained, but additional opinion is directly appended and used for filing and retrieval. This was needed as new specimens and analyses allowed for reevaluation of older specimens and the original depositors became unavailable. The time has come for the public sequence database to incorporate a community-curated, cumulative annotation process that allows third parties to improve the annotations of sequences when warranted by published peer-reviewed analyses.
Fil: Bidartondo, Martin I.. Imperial College London; Reino Unido. Royal Botanic Gardens; Reino Unido
Fil: Bruns, Thomas D.. University of California at Berkeley; Estados Unidos
Fil: Blackwell, Meredith. Louisiana State University; Estados Unidos
Fil: Edwards, Ivan. University of Michigan; Estados Unidos
Fil: Taylor, Andy F. S.. Swedish University of Agricultural Sciences; Suecia
Fil: Bianchinotti, Maria Virginia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; Argentina. Universidad Nacional del Sur; Argentina
Fil: Padamsee, Mahajabeen. University of Minnesota; Estados Unidos
Fil: Callac, Philippe. Institut National de la Recherche Agronomique; Francia
Fil: Lima, Nelson. Universidade do Minho; Portugal
Fil: White, Merlin M.. Boise State University; Estados Unidos
Fil: Barreau Daly, Camila. Centre National de la Recherche Scientifique; Francia. Institut National de la Recherche Agronomique; Francia
Fil: Juncai, M. A.. Chinese Academy of Sciences; República de China
Fil: Buyck, Bart. Museum National d'Histoire Naturelle; Francia
Fil: Rabeler, Richard K.. University of Michigan; Estados Unidos
Fil: Liles, Mark R.. Auburn University; Estados Unidos
Fil: Estes, Dwayne. Austin Peay State University; Estados Unidos
Fil: Carter, Richard. Valdosta State University; Estados Unidos
Fil: Herr Jr., J. M.. University of South Carolina; Estados Unidos
Fil: Chandler, Gregory. University of North Carolina; Estados Unidos
Fil: Kerekes, Jennifer. University of California at Berkeley; Estados Unidos
Fil: Cruse Sanders, Jennifer. Salem College Herbarium; Estados Unidos
Fil: Galán Marquez, R.. Universidad de Alcalá; España
Fil: Horak, Egon. Zurich Herbarium; Suiza
Fil: Fitzsimons, Michael. University of Chicago; Estados Unidos
Fil: Döering, Heidi. Royal Botanic Gardens; Reino Unido
Fil: Yao, Su. China Center of Industrial Culture Collection; China
Fil: Hynson, Nicole. University of California at Berkeley; Estados Unidos
Fil: Ryberg, Martin. University Goteborg; Suecia
Fil: Arnold, A. E.. University of Arizona; Estados Unidos
Fil: Hughes, Karen. University of Tennessee; Estados Unidos - Materia
-
Its
Taxonomy
Ecology
Bioinformatics - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- Atribución-NoComercial-CompartirIgual 2.5 Argentina (CC BY-NC-SA 2.5 AR)
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/45720
Ver los metadatos del registro completo
id |
CONICETDig_737b85d6b103c4cec4ca895928b6aa45 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/45720 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Preserving accuracy in GenBankBidartondo, Martin I.Bruns, Thomas D.Blackwell, MeredithEdwards, IvanTaylor, Andy F. S.Bianchinotti, Maria VirginiaPadamsee, MahajabeenCallac, PhilippeLima, NelsonWhite, Merlin M.Barreau Daly, CamilaJuncai, M. A.Buyck, BartRabeler, Richard K.Liles, Mark R.Estes, DwayneCarter, RichardHerr Jr., J. M.Chandler, GregoryKerekes, JenniferCruse Sanders, JenniferGalán Marquez, R.Horak, EgonFitzsimons, MichaelDöering, HeidiYao, SuHynson, NicoleRyberg, MartinArnold, A. E.Hughes, KarenItsTaxonomyEcologyBioinformaticshttps://purl.org/becyt/ford/1.6https://purl.org/becyt/ford/1GenBank, the public repository for nucleotide and protein sequences, is a critical resource for molecular biology, evolutionary biology, and ecology. While some attention has been drawn to sequence errors, common annotation errors also reduce the value of this database. In fact, for organisms such as fungi, which are notoriously difficult to identify, up to 20% of DNA sequence records may have erroneous lineage designations in GenBank. Gene function annotation in protein sequence databases is similarly error-prone. Because identity and function of new sequences are often determined by bioinformatic analyses, both types of errors are propagated into new accessions, leading to long-term degradation of the quality of the database. Currently, primary sequence data are annotated by the authors of those data, and can only be reannotated by the same authors. This is inefficient and unsustainable over the long term as authors eventually leave the field. Although it is possible to link third-party databases to GenBank records, this is a short-term solution that has little guarantee of permanence. Similarly, the current third-party annotation option in GenBank (TPA) complicates rather than solves the problem by creating an identical record with a new annotation, while leaving the original record unflagged and unlinked to the new record. Since the origin of public zoological and botanical specimen collections, an open system of cumulative annotation has evolved, whereby the original name is retained, but additional opinion is directly appended and used for filing and retrieval. This was needed as new specimens and analyses allowed for reevaluation of older specimens and the original depositors became unavailable. The time has come for the public sequence database to incorporate a community-curated, cumulative annotation process that allows third parties to improve the annotations of sequences when warranted by published peer-reviewed analyses.Fil: Bidartondo, Martin I.. Imperial College London; Reino Unido. Royal Botanic Gardens; Reino UnidoFil: Bruns, Thomas D.. University of California at Berkeley; Estados UnidosFil: Blackwell, Meredith. Louisiana State University; Estados UnidosFil: Edwards, Ivan. University of Michigan; Estados UnidosFil: Taylor, Andy F. S.. Swedish University of Agricultural Sciences; SueciaFil: Bianchinotti, Maria Virginia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; Argentina. Universidad Nacional del Sur; ArgentinaFil: Padamsee, Mahajabeen. University of Minnesota; Estados UnidosFil: Callac, Philippe. Institut National de la Recherche Agronomique; FranciaFil: Lima, Nelson. Universidade do Minho; PortugalFil: White, Merlin M.. Boise State University; Estados UnidosFil: Barreau Daly, Camila. Centre National de la Recherche Scientifique; Francia. Institut National de la Recherche Agronomique; FranciaFil: Juncai, M. A.. Chinese Academy of Sciences; República de ChinaFil: Buyck, Bart. Museum National d'Histoire Naturelle; FranciaFil: Rabeler, Richard K.. University of Michigan; Estados UnidosFil: Liles, Mark R.. Auburn University; Estados UnidosFil: Estes, Dwayne. Austin Peay State University; Estados UnidosFil: Carter, Richard. Valdosta State University; Estados UnidosFil: Herr Jr., J. M.. University of South Carolina; Estados UnidosFil: Chandler, Gregory. University of North Carolina; Estados UnidosFil: Kerekes, Jennifer. University of California at Berkeley; Estados UnidosFil: Cruse Sanders, Jennifer. Salem College Herbarium; Estados UnidosFil: Galán Marquez, R.. Universidad de Alcalá; EspañaFil: Horak, Egon. Zurich Herbarium; SuizaFil: Fitzsimons, Michael. University of Chicago; Estados UnidosFil: Döering, Heidi. Royal Botanic Gardens; Reino UnidoFil: Yao, Su. China Center of Industrial Culture Collection; ChinaFil: Hynson, Nicole. University of California at Berkeley; Estados UnidosFil: Ryberg, Martin. University Goteborg; SueciaFil: Arnold, A. E.. University of Arizona; Estados UnidosFil: Hughes, Karen. University of Tennessee; Estados UnidosAmerican Association for the Advancement of Science2008-03-21info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/45720Bidartondo, Martin I.; Bruns, Thomas D.; Blackwell, Meredith; Edwards, Ivan; Taylor, Andy F. S.; et al.; Preserving accuracy in GenBank; American Association for the Advancement of Science; Science; 319; 5870; 21-3-2008; 16160036-8075CONICET DigitalCONICETenginfo:eu-repo/semantics/reference/url/http://science.sciencemag.org/content/sci/suppl/2008/03/20/319.5870.1616a.DC1/Bidartondo.SOM.pdfinfo:eu-repo/semantics/altIdentifier/url/http://science.sciencemag.org/content/319/5870/1616.1info:eu-repo/semantics/altIdentifier/doi/10.1126/science.319.5870.1616ainfo:eu-repo/semantics/openAccessAtribución-NoComercial-CompartirIgual 2.5 Argentina (CC BY-NC-SA 2.5 AR)https://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:37:24Zoai:ri.conicet.gov.ar:11336/45720instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:37:25.005CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Preserving accuracy in GenBank |
title |
Preserving accuracy in GenBank |
spellingShingle |
Preserving accuracy in GenBank Bidartondo, Martin I. Its Taxonomy Ecology Bioinformatics |
title_short |
Preserving accuracy in GenBank |
title_full |
Preserving accuracy in GenBank |
title_fullStr |
Preserving accuracy in GenBank |
title_full_unstemmed |
Preserving accuracy in GenBank |
title_sort |
Preserving accuracy in GenBank |
dc.creator.none.fl_str_mv |
Bidartondo, Martin I. Bruns, Thomas D. Blackwell, Meredith Edwards, Ivan Taylor, Andy F. S. Bianchinotti, Maria Virginia Padamsee, Mahajabeen Callac, Philippe Lima, Nelson White, Merlin M. Barreau Daly, Camila Juncai, M. A. Buyck, Bart Rabeler, Richard K. Liles, Mark R. Estes, Dwayne Carter, Richard Herr Jr., J. M. Chandler, Gregory Kerekes, Jennifer Cruse Sanders, Jennifer Galán Marquez, R. Horak, Egon Fitzsimons, Michael Döering, Heidi Yao, Su Hynson, Nicole Ryberg, Martin Arnold, A. E. Hughes, Karen |
author |
Bidartondo, Martin I. |
author_facet |
Bidartondo, Martin I. Bruns, Thomas D. Blackwell, Meredith Edwards, Ivan Taylor, Andy F. S. Bianchinotti, Maria Virginia Padamsee, Mahajabeen Callac, Philippe Lima, Nelson White, Merlin M. Barreau Daly, Camila Juncai, M. A. Buyck, Bart Rabeler, Richard K. Liles, Mark R. Estes, Dwayne Carter, Richard Herr Jr., J. M. Chandler, Gregory Kerekes, Jennifer Cruse Sanders, Jennifer Galán Marquez, R. Horak, Egon Fitzsimons, Michael Döering, Heidi Yao, Su Hynson, Nicole Ryberg, Martin Arnold, A. E. Hughes, Karen |
author_role |
author |
author2 |
Bruns, Thomas D. Blackwell, Meredith Edwards, Ivan Taylor, Andy F. S. Bianchinotti, Maria Virginia Padamsee, Mahajabeen Callac, Philippe Lima, Nelson White, Merlin M. Barreau Daly, Camila Juncai, M. A. Buyck, Bart Rabeler, Richard K. Liles, Mark R. Estes, Dwayne Carter, Richard Herr Jr., J. M. Chandler, Gregory Kerekes, Jennifer Cruse Sanders, Jennifer Galán Marquez, R. Horak, Egon Fitzsimons, Michael Döering, Heidi Yao, Su Hynson, Nicole Ryberg, Martin Arnold, A. E. Hughes, Karen |
author2_role |
author author author author author author author author author author author author author author author author author author author author author author author author author author author author author |
dc.subject.none.fl_str_mv |
Its Taxonomy Ecology Bioinformatics |
topic |
Its Taxonomy Ecology Bioinformatics |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.6 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
GenBank, the public repository for nucleotide and protein sequences, is a critical resource for molecular biology, evolutionary biology, and ecology. While some attention has been drawn to sequence errors, common annotation errors also reduce the value of this database. In fact, for organisms such as fungi, which are notoriously difficult to identify, up to 20% of DNA sequence records may have erroneous lineage designations in GenBank. Gene function annotation in protein sequence databases is similarly error-prone. Because identity and function of new sequences are often determined by bioinformatic analyses, both types of errors are propagated into new accessions, leading to long-term degradation of the quality of the database. Currently, primary sequence data are annotated by the authors of those data, and can only be reannotated by the same authors. This is inefficient and unsustainable over the long term as authors eventually leave the field. Although it is possible to link third-party databases to GenBank records, this is a short-term solution that has little guarantee of permanence. Similarly, the current third-party annotation option in GenBank (TPA) complicates rather than solves the problem by creating an identical record with a new annotation, while leaving the original record unflagged and unlinked to the new record. Since the origin of public zoological and botanical specimen collections, an open system of cumulative annotation has evolved, whereby the original name is retained, but additional opinion is directly appended and used for filing and retrieval. This was needed as new specimens and analyses allowed for reevaluation of older specimens and the original depositors became unavailable. The time has come for the public sequence database to incorporate a community-curated, cumulative annotation process that allows third parties to improve the annotations of sequences when warranted by published peer-reviewed analyses. Fil: Bidartondo, Martin I.. Imperial College London; Reino Unido. Royal Botanic Gardens; Reino Unido Fil: Bruns, Thomas D.. University of California at Berkeley; Estados Unidos Fil: Blackwell, Meredith. Louisiana State University; Estados Unidos Fil: Edwards, Ivan. University of Michigan; Estados Unidos Fil: Taylor, Andy F. S.. Swedish University of Agricultural Sciences; Suecia Fil: Bianchinotti, Maria Virginia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; Argentina. Universidad Nacional del Sur; Argentina Fil: Padamsee, Mahajabeen. University of Minnesota; Estados Unidos Fil: Callac, Philippe. Institut National de la Recherche Agronomique; Francia Fil: Lima, Nelson. Universidade do Minho; Portugal Fil: White, Merlin M.. Boise State University; Estados Unidos Fil: Barreau Daly, Camila. Centre National de la Recherche Scientifique; Francia. Institut National de la Recherche Agronomique; Francia Fil: Juncai, M. A.. Chinese Academy of Sciences; República de China Fil: Buyck, Bart. Museum National d'Histoire Naturelle; Francia Fil: Rabeler, Richard K.. University of Michigan; Estados Unidos Fil: Liles, Mark R.. Auburn University; Estados Unidos Fil: Estes, Dwayne. Austin Peay State University; Estados Unidos Fil: Carter, Richard. Valdosta State University; Estados Unidos Fil: Herr Jr., J. M.. University of South Carolina; Estados Unidos Fil: Chandler, Gregory. University of North Carolina; Estados Unidos Fil: Kerekes, Jennifer. University of California at Berkeley; Estados Unidos Fil: Cruse Sanders, Jennifer. Salem College Herbarium; Estados Unidos Fil: Galán Marquez, R.. Universidad de Alcalá; España Fil: Horak, Egon. Zurich Herbarium; Suiza Fil: Fitzsimons, Michael. University of Chicago; Estados Unidos Fil: Döering, Heidi. Royal Botanic Gardens; Reino Unido Fil: Yao, Su. China Center of Industrial Culture Collection; China Fil: Hynson, Nicole. University of California at Berkeley; Estados Unidos Fil: Ryberg, Martin. University Goteborg; Suecia Fil: Arnold, A. E.. University of Arizona; Estados Unidos Fil: Hughes, Karen. University of Tennessee; Estados Unidos |
description |
GenBank, the public repository for nucleotide and protein sequences, is a critical resource for molecular biology, evolutionary biology, and ecology. While some attention has been drawn to sequence errors, common annotation errors also reduce the value of this database. In fact, for organisms such as fungi, which are notoriously difficult to identify, up to 20% of DNA sequence records may have erroneous lineage designations in GenBank. Gene function annotation in protein sequence databases is similarly error-prone. Because identity and function of new sequences are often determined by bioinformatic analyses, both types of errors are propagated into new accessions, leading to long-term degradation of the quality of the database. Currently, primary sequence data are annotated by the authors of those data, and can only be reannotated by the same authors. This is inefficient and unsustainable over the long term as authors eventually leave the field. Although it is possible to link third-party databases to GenBank records, this is a short-term solution that has little guarantee of permanence. Similarly, the current third-party annotation option in GenBank (TPA) complicates rather than solves the problem by creating an identical record with a new annotation, while leaving the original record unflagged and unlinked to the new record. Since the origin of public zoological and botanical specimen collections, an open system of cumulative annotation has evolved, whereby the original name is retained, but additional opinion is directly appended and used for filing and retrieval. This was needed as new specimens and analyses allowed for reevaluation of older specimens and the original depositors became unavailable. The time has come for the public sequence database to incorporate a community-curated, cumulative annotation process that allows third parties to improve the annotations of sequences when warranted by published peer-reviewed analyses. |
publishDate |
2008 |
dc.date.none.fl_str_mv |
2008-03-21 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/45720 Bidartondo, Martin I.; Bruns, Thomas D.; Blackwell, Meredith; Edwards, Ivan; Taylor, Andy F. S.; et al.; Preserving accuracy in GenBank; American Association for the Advancement of Science; Science; 319; 5870; 21-3-2008; 1616 0036-8075 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/45720 |
identifier_str_mv |
Bidartondo, Martin I.; Bruns, Thomas D.; Blackwell, Meredith; Edwards, Ivan; Taylor, Andy F. S.; et al.; Preserving accuracy in GenBank; American Association for the Advancement of Science; Science; 319; 5870; 21-3-2008; 1616 0036-8075 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/reference/url/http://science.sciencemag.org/content/sci/suppl/2008/03/20/319.5870.1616a.DC1/Bidartondo.SOM.pdf info:eu-repo/semantics/altIdentifier/url/http://science.sciencemag.org/content/319/5870/1616.1 info:eu-repo/semantics/altIdentifier/doi/10.1126/science.319.5870.1616a |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess Atribución-NoComercial-CompartirIgual 2.5 Argentina (CC BY-NC-SA 2.5 AR) https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
Atribución-NoComercial-CompartirIgual 2.5 Argentina (CC BY-NC-SA 2.5 AR) https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
American Association for the Advancement of Science |
publisher.none.fl_str_mv |
American Association for the Advancement of Science |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844613178926825472 |
score |
13.070432 |