Preserving accuracy in GenBank

Autores
Bidartondo, Martin I.; Bruns, Thomas D.; Blackwell, Meredith; Edwards, Ivan; Taylor, Andy F. S.; Bianchinotti, Maria Virginia; Padamsee, Mahajabeen; Callac, Philippe; Lima, Nelson; White, Merlin M.; Barreau Daly, Camila; Juncai, M. A.; Buyck, Bart; Rabeler, Richard K.; Liles, Mark R.; Estes, Dwayne; Carter, Richard; Herr Jr., J. M.; Chandler, Gregory; Kerekes, Jennifer; Cruse Sanders, Jennifer; Galán Marquez, R.; Horak, Egon; Fitzsimons, Michael; Döering, Heidi; Yao, Su; Hynson, Nicole; Ryberg, Martin; Arnold, A. E.; Hughes, Karen
Año de publicación
2008
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
GenBank, the public repository for nucleotide and protein sequences, is a critical resource for molecular biology, evolutionary biology, and ecology. While some attention has been drawn to sequence errors, common annotation errors also reduce the value of this database. In fact, for organisms such as fungi, which are notoriously difficult to identify, up to 20% of DNA sequence records may have erroneous lineage designations in GenBank. Gene function annotation in protein sequence databases is similarly error-prone. Because identity and function of new sequences are often determined by bioinformatic analyses, both types of errors are propagated into new accessions, leading to long-term degradation of the quality of the database. Currently, primary sequence data are annotated by the authors of those data, and can only be reannotated by the same authors. This is inefficient and unsustainable over the long term as authors eventually leave the field. Although it is possible to link third-party databases to GenBank records, this is a short-term solution that has little guarantee of permanence. Similarly, the current third-party annotation option in GenBank (TPA) complicates rather than solves the problem by creating an identical record with a new annotation, while leaving the original record unflagged and unlinked to the new record. Since the origin of public zoological and botanical specimen collections, an open system of cumulative annotation has evolved, whereby the original name is retained, but additional opinion is directly appended and used for filing and retrieval. This was needed as new specimens and analyses allowed for reevaluation of older specimens and the original depositors became unavailable. The time has come for the public sequence database to incorporate a community-curated, cumulative annotation process that allows third parties to improve the annotations of sequences when warranted by published peer-reviewed analyses.
Fil: Bidartondo, Martin I.. Imperial College London; Reino Unido. Royal Botanic Gardens; Reino Unido
Fil: Bruns, Thomas D.. University of California at Berkeley; Estados Unidos
Fil: Blackwell, Meredith. Louisiana State University; Estados Unidos
Fil: Edwards, Ivan. University of Michigan; Estados Unidos
Fil: Taylor, Andy F. S.. Swedish University of Agricultural Sciences; Suecia
Fil: Bianchinotti, Maria Virginia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; Argentina. Universidad Nacional del Sur; Argentina
Fil: Padamsee, Mahajabeen. University of Minnesota; Estados Unidos
Fil: Callac, Philippe. Institut National de la Recherche Agronomique; Francia
Fil: Lima, Nelson. Universidade do Minho; Portugal
Fil: White, Merlin M.. Boise State University; Estados Unidos
Fil: Barreau Daly, Camila. Centre National de la Recherche Scientifique; Francia. Institut National de la Recherche Agronomique; Francia
Fil: Juncai, M. A.. Chinese Academy of Sciences; República de China
Fil: Buyck, Bart. Museum National d'Histoire Naturelle; Francia
Fil: Rabeler, Richard K.. University of Michigan; Estados Unidos
Fil: Liles, Mark R.. Auburn University; Estados Unidos
Fil: Estes, Dwayne. Austin Peay State University; Estados Unidos
Fil: Carter, Richard. Valdosta State University; Estados Unidos
Fil: Herr Jr., J. M.. University of South Carolina; Estados Unidos
Fil: Chandler, Gregory. University of North Carolina; Estados Unidos
Fil: Kerekes, Jennifer. University of California at Berkeley; Estados Unidos
Fil: Cruse Sanders, Jennifer. Salem College Herbarium; Estados Unidos
Fil: Galán Marquez, R.. Universidad de Alcalá; España
Fil: Horak, Egon. Zurich Herbarium; Suiza
Fil: Fitzsimons, Michael. University of Chicago; Estados Unidos
Fil: Döering, Heidi. Royal Botanic Gardens; Reino Unido
Fil: Yao, Su. China Center of Industrial Culture Collection; China
Fil: Hynson, Nicole. University of California at Berkeley; Estados Unidos
Fil: Ryberg, Martin. University Goteborg; Suecia
Fil: Arnold, A. E.. University of Arizona; Estados Unidos
Fil: Hughes, Karen. University of Tennessee; Estados Unidos
Materia
Its
Taxonomy
Ecology
Bioinformatics
Nivel de accesibilidad
acceso abierto
Condiciones de uso
Atribución-NoComercial-CompartirIgual 2.5 Argentina (CC BY-NC-SA 2.5 AR)
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/45720

id CONICETDig_737b85d6b103c4cec4ca895928b6aa45
oai_identifier_str oai:ri.conicet.gov.ar:11336/45720
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Preserving accuracy in GenBankBidartondo, Martin I.Bruns, Thomas D.Blackwell, MeredithEdwards, IvanTaylor, Andy F. S.Bianchinotti, Maria VirginiaPadamsee, MahajabeenCallac, PhilippeLima, NelsonWhite, Merlin M.Barreau Daly, CamilaJuncai, M. A.Buyck, BartRabeler, Richard K.Liles, Mark R.Estes, DwayneCarter, RichardHerr Jr., J. M.Chandler, GregoryKerekes, JenniferCruse Sanders, JenniferGalán Marquez, R.Horak, EgonFitzsimons, MichaelDöering, HeidiYao, SuHynson, NicoleRyberg, MartinArnold, A. E.Hughes, KarenItsTaxonomyEcologyBioinformaticshttps://purl.org/becyt/ford/1.6https://purl.org/becyt/ford/1GenBank, the public repository for nucleotide and protein sequences, is a critical resource for molecular biology, evolutionary biology, and ecology. While some attention has been drawn to sequence errors, common annotation errors also reduce the value of this database. In fact, for organisms such as fungi, which are notoriously difficult to identify, up to 20% of DNA sequence records may have erroneous lineage designations in GenBank. Gene function annotation in protein sequence databases is similarly error-prone. Because identity and function of new sequences are often determined by bioinformatic analyses, both types of errors are propagated into new accessions, leading to long-term degradation of the quality of the database. Currently, primary sequence data are annotated by the authors of those data, and can only be reannotated by the same authors. This is inefficient and unsustainable over the long term as authors eventually leave the field. Although it is possible to link third-party databases to GenBank records, this is a short-term solution that has little guarantee of permanence. Similarly, the current third-party annotation option in GenBank (TPA) complicates rather than solves the problem by creating an identical record with a new annotation, while leaving the original record unflagged and unlinked to the new record. Since the origin of public zoological and botanical specimen collections, an open system of cumulative annotation has evolved, whereby the original name is retained, but additional opinion is directly appended and used for filing and retrieval. This was needed as new specimens and analyses allowed for reevaluation of older specimens and the original depositors became unavailable. The time has come for the public sequence database to incorporate a community-curated, cumulative annotation process that allows third parties to improve the annotations of sequences when warranted by published peer-reviewed analyses.Fil: Bidartondo, Martin I.. Imperial College London; Reino Unido. Royal Botanic Gardens; Reino UnidoFil: Bruns, Thomas D.. University of California at Berkeley; Estados UnidosFil: Blackwell, Meredith. Louisiana State University; Estados UnidosFil: Edwards, Ivan. University of Michigan; Estados UnidosFil: Taylor, Andy F. S.. Swedish University of Agricultural Sciences; SueciaFil: Bianchinotti, Maria Virginia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; Argentina. Universidad Nacional del Sur; ArgentinaFil: Padamsee, Mahajabeen. University of Minnesota; Estados UnidosFil: Callac, Philippe. Institut National de la Recherche Agronomique; FranciaFil: Lima, Nelson. Universidade do Minho; PortugalFil: White, Merlin M.. Boise State University; Estados UnidosFil: Barreau Daly, Camila. Centre National de la Recherche Scientifique; Francia. Institut National de la Recherche Agronomique; FranciaFil: Juncai, M. A.. Chinese Academy of Sciences; República de ChinaFil: Buyck, Bart. Museum National d'Histoire Naturelle; FranciaFil: Rabeler, Richard K.. University of Michigan; Estados UnidosFil: Liles, Mark R.. Auburn University; Estados UnidosFil: Estes, Dwayne. Austin Peay State University; Estados UnidosFil: Carter, Richard. Valdosta State University; Estados UnidosFil: Herr Jr., J. M.. University of South Carolina; Estados UnidosFil: Chandler, Gregory. University of North Carolina; Estados UnidosFil: Kerekes, Jennifer. University of California at Berkeley; Estados UnidosFil: Cruse Sanders, Jennifer. Salem College Herbarium; Estados UnidosFil: Galán Marquez, R.. Universidad de Alcalá; EspañaFil: Horak, Egon. Zurich Herbarium; SuizaFil: Fitzsimons, Michael. University of Chicago; Estados UnidosFil: Döering, Heidi. Royal Botanic Gardens; Reino UnidoFil: Yao, Su. China Center of Industrial Culture Collection; ChinaFil: Hynson, Nicole. University of California at Berkeley; Estados UnidosFil: Ryberg, Martin. University Goteborg; SueciaFil: Arnold, A. E.. University of Arizona; Estados UnidosFil: Hughes, Karen. University of Tennessee; Estados UnidosAmerican Association for the Advancement of Science2008-03-21info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/45720Bidartondo, Martin I.; Bruns, Thomas D.; Blackwell, Meredith; Edwards, Ivan; Taylor, Andy F. S.; et al.; Preserving accuracy in GenBank; American Association for the Advancement of Science; Science; 319; 5870; 21-3-2008; 16160036-8075CONICET DigitalCONICETenginfo:eu-repo/semantics/reference/url/http://science.sciencemag.org/content/sci/suppl/2008/03/20/319.5870.1616a.DC1/Bidartondo.SOM.pdfinfo:eu-repo/semantics/altIdentifier/url/http://science.sciencemag.org/content/319/5870/1616.1info:eu-repo/semantics/altIdentifier/doi/10.1126/science.319.5870.1616ainfo:eu-repo/semantics/openAccessAtribución-NoComercial-CompartirIgual 2.5 Argentina (CC BY-NC-SA 2.5 AR)https://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:37:24Zoai:ri.conicet.gov.ar:11336/45720instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:37:25.005CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Preserving accuracy in GenBank
title Preserving accuracy in GenBank
spellingShingle Preserving accuracy in GenBank
Bidartondo, Martin I.
Its
Taxonomy
Ecology
Bioinformatics
title_short Preserving accuracy in GenBank
title_full Preserving accuracy in GenBank
title_fullStr Preserving accuracy in GenBank
title_full_unstemmed Preserving accuracy in GenBank
title_sort Preserving accuracy in GenBank
dc.creator.none.fl_str_mv Bidartondo, Martin I.
Bruns, Thomas D.
Blackwell, Meredith
Edwards, Ivan
Taylor, Andy F. S.
Bianchinotti, Maria Virginia
Padamsee, Mahajabeen
Callac, Philippe
Lima, Nelson
White, Merlin M.
Barreau Daly, Camila
Juncai, M. A.
Buyck, Bart
Rabeler, Richard K.
Liles, Mark R.
Estes, Dwayne
Carter, Richard
Herr Jr., J. M.
Chandler, Gregory
Kerekes, Jennifer
Cruse Sanders, Jennifer
Galán Marquez, R.
Horak, Egon
Fitzsimons, Michael
Döering, Heidi
Yao, Su
Hynson, Nicole
Ryberg, Martin
Arnold, A. E.
Hughes, Karen
author Bidartondo, Martin I.
author_facet Bidartondo, Martin I.
Bruns, Thomas D.
Blackwell, Meredith
Edwards, Ivan
Taylor, Andy F. S.
Bianchinotti, Maria Virginia
Padamsee, Mahajabeen
Callac, Philippe
Lima, Nelson
White, Merlin M.
Barreau Daly, Camila
Juncai, M. A.
Buyck, Bart
Rabeler, Richard K.
Liles, Mark R.
Estes, Dwayne
Carter, Richard
Herr Jr., J. M.
Chandler, Gregory
Kerekes, Jennifer
Cruse Sanders, Jennifer
Galán Marquez, R.
Horak, Egon
Fitzsimons, Michael
Döering, Heidi
Yao, Su
Hynson, Nicole
Ryberg, Martin
Arnold, A. E.
Hughes, Karen
author_role author
author2 Bruns, Thomas D.
Blackwell, Meredith
Edwards, Ivan
Taylor, Andy F. S.
Bianchinotti, Maria Virginia
Padamsee, Mahajabeen
Callac, Philippe
Lima, Nelson
White, Merlin M.
Barreau Daly, Camila
Juncai, M. A.
Buyck, Bart
Rabeler, Richard K.
Liles, Mark R.
Estes, Dwayne
Carter, Richard
Herr Jr., J. M.
Chandler, Gregory
Kerekes, Jennifer
Cruse Sanders, Jennifer
Galán Marquez, R.
Horak, Egon
Fitzsimons, Michael
Döering, Heidi
Yao, Su
Hynson, Nicole
Ryberg, Martin
Arnold, A. E.
Hughes, Karen
author2_role author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
author
dc.subject.none.fl_str_mv Its
Taxonomy
Ecology
Bioinformatics
topic Its
Taxonomy
Ecology
Bioinformatics
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.6
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv GenBank, the public repository for nucleotide and protein sequences, is a critical resource for molecular biology, evolutionary biology, and ecology. While some attention has been drawn to sequence errors, common annotation errors also reduce the value of this database. In fact, for organisms such as fungi, which are notoriously difficult to identify, up to 20% of DNA sequence records may have erroneous lineage designations in GenBank. Gene function annotation in protein sequence databases is similarly error-prone. Because identity and function of new sequences are often determined by bioinformatic analyses, both types of errors are propagated into new accessions, leading to long-term degradation of the quality of the database. Currently, primary sequence data are annotated by the authors of those data, and can only be reannotated by the same authors. This is inefficient and unsustainable over the long term as authors eventually leave the field. Although it is possible to link third-party databases to GenBank records, this is a short-term solution that has little guarantee of permanence. Similarly, the current third-party annotation option in GenBank (TPA) complicates rather than solves the problem by creating an identical record with a new annotation, while leaving the original record unflagged and unlinked to the new record. Since the origin of public zoological and botanical specimen collections, an open system of cumulative annotation has evolved, whereby the original name is retained, but additional opinion is directly appended and used for filing and retrieval. This was needed as new specimens and analyses allowed for reevaluation of older specimens and the original depositors became unavailable. The time has come for the public sequence database to incorporate a community-curated, cumulative annotation process that allows third parties to improve the annotations of sequences when warranted by published peer-reviewed analyses.
Fil: Bidartondo, Martin I.. Imperial College London; Reino Unido. Royal Botanic Gardens; Reino Unido
Fil: Bruns, Thomas D.. University of California at Berkeley; Estados Unidos
Fil: Blackwell, Meredith. Louisiana State University; Estados Unidos
Fil: Edwards, Ivan. University of Michigan; Estados Unidos
Fil: Taylor, Andy F. S.. Swedish University of Agricultural Sciences; Suecia
Fil: Bianchinotti, Maria Virginia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; Argentina. Universidad Nacional del Sur; Argentina
Fil: Padamsee, Mahajabeen. University of Minnesota; Estados Unidos
Fil: Callac, Philippe. Institut National de la Recherche Agronomique; Francia
Fil: Lima, Nelson. Universidade do Minho; Portugal
Fil: White, Merlin M.. Boise State University; Estados Unidos
Fil: Barreau Daly, Camila. Centre National de la Recherche Scientifique; Francia. Institut National de la Recherche Agronomique; Francia
Fil: Juncai, M. A.. Chinese Academy of Sciences; República de China
Fil: Buyck, Bart. Museum National d'Histoire Naturelle; Francia
Fil: Rabeler, Richard K.. University of Michigan; Estados Unidos
Fil: Liles, Mark R.. Auburn University; Estados Unidos
Fil: Estes, Dwayne. Austin Peay State University; Estados Unidos
Fil: Carter, Richard. Valdosta State University; Estados Unidos
Fil: Herr Jr., J. M.. University of South Carolina; Estados Unidos
Fil: Chandler, Gregory. University of North Carolina; Estados Unidos
Fil: Kerekes, Jennifer. University of California at Berkeley; Estados Unidos
Fil: Cruse Sanders, Jennifer. Salem College Herbarium; Estados Unidos
Fil: Galán Marquez, R.. Universidad de Alcalá; España
Fil: Horak, Egon. Zurich Herbarium; Suiza
Fil: Fitzsimons, Michael. University of Chicago; Estados Unidos
Fil: Döering, Heidi. Royal Botanic Gardens; Reino Unido
Fil: Yao, Su. China Center of Industrial Culture Collection; China
Fil: Hynson, Nicole. University of California at Berkeley; Estados Unidos
Fil: Ryberg, Martin. University Goteborg; Suecia
Fil: Arnold, A. E.. University of Arizona; Estados Unidos
Fil: Hughes, Karen. University of Tennessee; Estados Unidos
description GenBank, the public repository for nucleotide and protein sequences, is a critical resource for molecular biology, evolutionary biology, and ecology. While some attention has been drawn to sequence errors, common annotation errors also reduce the value of this database. In fact, for organisms such as fungi, which are notoriously difficult to identify, up to 20% of DNA sequence records may have erroneous lineage designations in GenBank. Gene function annotation in protein sequence databases is similarly error-prone. Because identity and function of new sequences are often determined by bioinformatic analyses, both types of errors are propagated into new accessions, leading to long-term degradation of the quality of the database. Currently, primary sequence data are annotated by the authors of those data, and can only be reannotated by the same authors. This is inefficient and unsustainable over the long term as authors eventually leave the field. Although it is possible to link third-party databases to GenBank records, this is a short-term solution that has little guarantee of permanence. Similarly, the current third-party annotation option in GenBank (TPA) complicates rather than solves the problem by creating an identical record with a new annotation, while leaving the original record unflagged and unlinked to the new record. Since the origin of public zoological and botanical specimen collections, an open system of cumulative annotation has evolved, whereby the original name is retained, but additional opinion is directly appended and used for filing and retrieval. This was needed as new specimens and analyses allowed for reevaluation of older specimens and the original depositors became unavailable. The time has come for the public sequence database to incorporate a community-curated, cumulative annotation process that allows third parties to improve the annotations of sequences when warranted by published peer-reviewed analyses.
publishDate 2008
dc.date.none.fl_str_mv 2008-03-21
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/45720
Bidartondo, Martin I.; Bruns, Thomas D.; Blackwell, Meredith; Edwards, Ivan; Taylor, Andy F. S.; et al.; Preserving accuracy in GenBank; American Association for the Advancement of Science; Science; 319; 5870; 21-3-2008; 1616
0036-8075
CONICET Digital
CONICET
url http://hdl.handle.net/11336/45720
identifier_str_mv Bidartondo, Martin I.; Bruns, Thomas D.; Blackwell, Meredith; Edwards, Ivan; Taylor, Andy F. S.; et al.; Preserving accuracy in GenBank; American Association for the Advancement of Science; Science; 319; 5870; 21-3-2008; 1616
0036-8075
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/reference/url/http://science.sciencemag.org/content/sci/suppl/2008/03/20/319.5870.1616a.DC1/Bidartondo.SOM.pdf
info:eu-repo/semantics/altIdentifier/url/http://science.sciencemag.org/content/319/5870/1616.1
info:eu-repo/semantics/altIdentifier/doi/10.1126/science.319.5870.1616a
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
Atribución-NoComercial-CompartirIgual 2.5 Argentina (CC BY-NC-SA 2.5 AR)
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv Atribución-NoComercial-CompartirIgual 2.5 Argentina (CC BY-NC-SA 2.5 AR)
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv American Association for the Advancement of Science
publisher.none.fl_str_mv American Association for the Advancement of Science
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844613178926825472
score 13.070432