GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data

Autores
Cooke, Thomas F.; Yee, Muh-Ching; Muzzio, Marina; Sockell, Alexandra; Bell, Ryan; Cornejo, Omar E.; Kelley, Joanna L.; Bailliet, Graciela; Bravi, Claudio Marcelo; Bustamante, Carlos D.; Kenny, Eimear
Año de publicación
2016
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth.
Fil: Cooke, Thomas F.. University of Stanford; Estados Unidos
Fil: Yee, Muh-Ching. Carnegie Institution for Science; Estados Unidos
Fil: Muzzio, Marina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Naturales y Museo; Argentina. University of Stanford; Estados Unidos. Charles Bronfman Institute of Personalized Medicine; Estados Unidos
Fil: Sockell, Alexandra. University of Stanford; Estados Unidos
Fil: Bell, Ryan. University of Stanford; Estados Unidos
Fil: Cornejo, Omar E.. University of Stanford; Estados Unidos. Washington State University; Estados Unidos
Fil: Kelley, Joanna L.. University of Stanford; Estados Unidos. Washington State University; Estados Unidos
Fil: Bailliet, Graciela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto Multidisciplinario de Biología Celular. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Instituto Multidisciplinario de Biología Celular. Universidad Nacional de La Plata. Instituto Multidisciplinario de Biología Celular; Argentina
Fil: Bravi, Claudio Marcelo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto Multidisciplinario de Biología Celular. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Instituto Multidisciplinario de Biología Celular. Universidad Nacional de La Plata. Instituto Multidisciplinario de Biología Celular; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Naturales y Museo; Argentina
Fil: Bustamante, Carlos D.. University of Stanford; Estados Unidos
Fil: Kenny, Eimear. University of Stanford; Estados Unidos. Charles Bronfman Institute of Personalized Medicine; Estados Unidos. Icahn School of Medicine at Mount Sinai; Estados Unidos
Materia
GENOTYPE BY SEQUENCING
NGS
REDUCED REPRESENTATION LIBRARIES
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/66656

id CONICETDig_2f7e4340c8e489a6be829d7fb2ccf246
oai_identifier_str oai:ri.conicet.gov.ar:11336/66656
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing DataCooke, Thomas F.Yee, Muh-ChingMuzzio, MarinaSockell, AlexandraBell, RyanCornejo, Omar E.Kelley, Joanna L.Bailliet, GracielaBravi, Claudio MarceloBustamante, Carlos D.Kenny, EimearGENOTYPE BY SEQUENCINGNGSREDUCED REPRESENTATION LIBRARIEShttps://purl.org/becyt/ford/1.6https://purl.org/becyt/ford/1Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth.Fil: Cooke, Thomas F.. University of Stanford; Estados UnidosFil: Yee, Muh-Ching. Carnegie Institution for Science; Estados UnidosFil: Muzzio, Marina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Naturales y Museo; Argentina. University of Stanford; Estados Unidos. Charles Bronfman Institute of Personalized Medicine; Estados UnidosFil: Sockell, Alexandra. University of Stanford; Estados UnidosFil: Bell, Ryan. University of Stanford; Estados UnidosFil: Cornejo, Omar E.. University of Stanford; Estados Unidos. Washington State University; Estados UnidosFil: Kelley, Joanna L.. University of Stanford; Estados Unidos. Washington State University; Estados UnidosFil: Bailliet, Graciela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto Multidisciplinario de Biología Celular. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Instituto Multidisciplinario de Biología Celular. Universidad Nacional de La Plata. Instituto Multidisciplinario de Biología Celular; ArgentinaFil: Bravi, Claudio Marcelo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto Multidisciplinario de Biología Celular. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Instituto Multidisciplinario de Biología Celular. Universidad Nacional de La Plata. Instituto Multidisciplinario de Biología Celular; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Naturales y Museo; ArgentinaFil: Bustamante, Carlos D.. University of Stanford; Estados UnidosFil: Kenny, Eimear. University of Stanford; Estados Unidos. Charles Bronfman Institute of Personalized Medicine; Estados Unidos. Icahn School of Medicine at Mount Sinai; Estados UnidosPublic Library of Science2016-02info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/66656Cooke, Thomas F.; Yee, Muh-Ching; Muzzio, Marina; Sockell, Alexandra; Bell, Ryan; et al.; GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data; Public Library of Science; Plos Genetics; 12; 2; 2-2016; 1-181553-7390CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1371/journal.pgen.1005631info:eu-repo/semantics/altIdentifier/url/https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1005631info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-03T09:53:05Zoai:ri.conicet.gov.ar:11336/66656instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-03 09:53:06.125CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
title GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
spellingShingle GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
Cooke, Thomas F.
GENOTYPE BY SEQUENCING
NGS
REDUCED REPRESENTATION LIBRARIES
title_short GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
title_full GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
title_fullStr GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
title_full_unstemmed GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
title_sort GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
dc.creator.none.fl_str_mv Cooke, Thomas F.
Yee, Muh-Ching
Muzzio, Marina
Sockell, Alexandra
Bell, Ryan
Cornejo, Omar E.
Kelley, Joanna L.
Bailliet, Graciela
Bravi, Claudio Marcelo
Bustamante, Carlos D.
Kenny, Eimear
author Cooke, Thomas F.
author_facet Cooke, Thomas F.
Yee, Muh-Ching
Muzzio, Marina
Sockell, Alexandra
Bell, Ryan
Cornejo, Omar E.
Kelley, Joanna L.
Bailliet, Graciela
Bravi, Claudio Marcelo
Bustamante, Carlos D.
Kenny, Eimear
author_role author
author2 Yee, Muh-Ching
Muzzio, Marina
Sockell, Alexandra
Bell, Ryan
Cornejo, Omar E.
Kelley, Joanna L.
Bailliet, Graciela
Bravi, Claudio Marcelo
Bustamante, Carlos D.
Kenny, Eimear
author2_role author
author
author
author
author
author
author
author
author
author
dc.subject.none.fl_str_mv GENOTYPE BY SEQUENCING
NGS
REDUCED REPRESENTATION LIBRARIES
topic GENOTYPE BY SEQUENCING
NGS
REDUCED REPRESENTATION LIBRARIES
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.6
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth.
Fil: Cooke, Thomas F.. University of Stanford; Estados Unidos
Fil: Yee, Muh-Ching. Carnegie Institution for Science; Estados Unidos
Fil: Muzzio, Marina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Naturales y Museo; Argentina. University of Stanford; Estados Unidos. Charles Bronfman Institute of Personalized Medicine; Estados Unidos
Fil: Sockell, Alexandra. University of Stanford; Estados Unidos
Fil: Bell, Ryan. University of Stanford; Estados Unidos
Fil: Cornejo, Omar E.. University of Stanford; Estados Unidos. Washington State University; Estados Unidos
Fil: Kelley, Joanna L.. University of Stanford; Estados Unidos. Washington State University; Estados Unidos
Fil: Bailliet, Graciela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto Multidisciplinario de Biología Celular. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Instituto Multidisciplinario de Biología Celular. Universidad Nacional de La Plata. Instituto Multidisciplinario de Biología Celular; Argentina
Fil: Bravi, Claudio Marcelo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto Multidisciplinario de Biología Celular. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Instituto Multidisciplinario de Biología Celular. Universidad Nacional de La Plata. Instituto Multidisciplinario de Biología Celular; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Naturales y Museo; Argentina
Fil: Bustamante, Carlos D.. University of Stanford; Estados Unidos
Fil: Kenny, Eimear. University of Stanford; Estados Unidos. Charles Bronfman Institute of Personalized Medicine; Estados Unidos. Icahn School of Medicine at Mount Sinai; Estados Unidos
description Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth.
publishDate 2016
dc.date.none.fl_str_mv 2016-02
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/66656
Cooke, Thomas F.; Yee, Muh-Ching; Muzzio, Marina; Sockell, Alexandra; Bell, Ryan; et al.; GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data; Public Library of Science; Plos Genetics; 12; 2; 2-2016; 1-18
1553-7390
CONICET Digital
CONICET
url http://hdl.handle.net/11336/66656
identifier_str_mv Cooke, Thomas F.; Yee, Muh-Ching; Muzzio, Marina; Sockell, Alexandra; Bell, Ryan; et al.; GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data; Public Library of Science; Plos Genetics; 12; 2; 2-2016; 1-18
1553-7390
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/10.1371/journal.pgen.1005631
info:eu-repo/semantics/altIdentifier/url/https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1005631
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Public Library of Science
publisher.none.fl_str_mv Public Library of Science
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1842269200703291392
score 13.13397