GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
- Autores
- Cooke, Thomas F.; Yee, Muh-Ching; Muzzio, Marina; Sockell, Alexandra; Bell, Ryan; Cornejo, Omar E.; Kelley, Joanna L.; Bailliet, Graciela; Bravi, Claudio Marcelo; Bustamante, Carlos D.; Kenny, Eimear
- Año de publicación
- 2016
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth.
Fil: Cooke, Thomas F.. University of Stanford; Estados Unidos
Fil: Yee, Muh-Ching. Carnegie Institution for Science; Estados Unidos
Fil: Muzzio, Marina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Naturales y Museo; Argentina. University of Stanford; Estados Unidos. Charles Bronfman Institute of Personalized Medicine; Estados Unidos
Fil: Sockell, Alexandra. University of Stanford; Estados Unidos
Fil: Bell, Ryan. University of Stanford; Estados Unidos
Fil: Cornejo, Omar E.. University of Stanford; Estados Unidos. Washington State University; Estados Unidos
Fil: Kelley, Joanna L.. University of Stanford; Estados Unidos. Washington State University; Estados Unidos
Fil: Bailliet, Graciela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto Multidisciplinario de Biología Celular. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Instituto Multidisciplinario de Biología Celular. Universidad Nacional de La Plata. Instituto Multidisciplinario de Biología Celular; Argentina
Fil: Bravi, Claudio Marcelo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto Multidisciplinario de Biología Celular. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Instituto Multidisciplinario de Biología Celular. Universidad Nacional de La Plata. Instituto Multidisciplinario de Biología Celular; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Naturales y Museo; Argentina
Fil: Bustamante, Carlos D.. University of Stanford; Estados Unidos
Fil: Kenny, Eimear. University of Stanford; Estados Unidos. Charles Bronfman Institute of Personalized Medicine; Estados Unidos. Icahn School of Medicine at Mount Sinai; Estados Unidos - Materia
-
GENOTYPE BY SEQUENCING
NGS
REDUCED REPRESENTATION LIBRARIES - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/66656
Ver los metadatos del registro completo
id |
CONICETDig_2f7e4340c8e489a6be829d7fb2ccf246 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/66656 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing DataCooke, Thomas F.Yee, Muh-ChingMuzzio, MarinaSockell, AlexandraBell, RyanCornejo, Omar E.Kelley, Joanna L.Bailliet, GracielaBravi, Claudio MarceloBustamante, Carlos D.Kenny, EimearGENOTYPE BY SEQUENCINGNGSREDUCED REPRESENTATION LIBRARIEShttps://purl.org/becyt/ford/1.6https://purl.org/becyt/ford/1Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth.Fil: Cooke, Thomas F.. University of Stanford; Estados UnidosFil: Yee, Muh-Ching. Carnegie Institution for Science; Estados UnidosFil: Muzzio, Marina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Naturales y Museo; Argentina. University of Stanford; Estados Unidos. Charles Bronfman Institute of Personalized Medicine; Estados UnidosFil: Sockell, Alexandra. University of Stanford; Estados UnidosFil: Bell, Ryan. University of Stanford; Estados UnidosFil: Cornejo, Omar E.. University of Stanford; Estados Unidos. Washington State University; Estados UnidosFil: Kelley, Joanna L.. University of Stanford; Estados Unidos. Washington State University; Estados UnidosFil: Bailliet, Graciela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto Multidisciplinario de Biología Celular. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Instituto Multidisciplinario de Biología Celular. Universidad Nacional de La Plata. Instituto Multidisciplinario de Biología Celular; ArgentinaFil: Bravi, Claudio Marcelo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto Multidisciplinario de Biología Celular. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Instituto Multidisciplinario de Biología Celular. Universidad Nacional de La Plata. Instituto Multidisciplinario de Biología Celular; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Naturales y Museo; ArgentinaFil: Bustamante, Carlos D.. University of Stanford; Estados UnidosFil: Kenny, Eimear. University of Stanford; Estados Unidos. Charles Bronfman Institute of Personalized Medicine; Estados Unidos. Icahn School of Medicine at Mount Sinai; Estados UnidosPublic Library of Science2016-02info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/66656Cooke, Thomas F.; Yee, Muh-Ching; Muzzio, Marina; Sockell, Alexandra; Bell, Ryan; et al.; GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data; Public Library of Science; Plos Genetics; 12; 2; 2-2016; 1-181553-7390CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1371/journal.pgen.1005631info:eu-repo/semantics/altIdentifier/url/https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1005631info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-03T09:53:05Zoai:ri.conicet.gov.ar:11336/66656instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-03 09:53:06.125CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data |
title |
GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data |
spellingShingle |
GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data Cooke, Thomas F. GENOTYPE BY SEQUENCING NGS REDUCED REPRESENTATION LIBRARIES |
title_short |
GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data |
title_full |
GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data |
title_fullStr |
GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data |
title_full_unstemmed |
GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data |
title_sort |
GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data |
dc.creator.none.fl_str_mv |
Cooke, Thomas F. Yee, Muh-Ching Muzzio, Marina Sockell, Alexandra Bell, Ryan Cornejo, Omar E. Kelley, Joanna L. Bailliet, Graciela Bravi, Claudio Marcelo Bustamante, Carlos D. Kenny, Eimear |
author |
Cooke, Thomas F. |
author_facet |
Cooke, Thomas F. Yee, Muh-Ching Muzzio, Marina Sockell, Alexandra Bell, Ryan Cornejo, Omar E. Kelley, Joanna L. Bailliet, Graciela Bravi, Claudio Marcelo Bustamante, Carlos D. Kenny, Eimear |
author_role |
author |
author2 |
Yee, Muh-Ching Muzzio, Marina Sockell, Alexandra Bell, Ryan Cornejo, Omar E. Kelley, Joanna L. Bailliet, Graciela Bravi, Claudio Marcelo Bustamante, Carlos D. Kenny, Eimear |
author2_role |
author author author author author author author author author author |
dc.subject.none.fl_str_mv |
GENOTYPE BY SEQUENCING NGS REDUCED REPRESENTATION LIBRARIES |
topic |
GENOTYPE BY SEQUENCING NGS REDUCED REPRESENTATION LIBRARIES |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.6 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth. Fil: Cooke, Thomas F.. University of Stanford; Estados Unidos Fil: Yee, Muh-Ching. Carnegie Institution for Science; Estados Unidos Fil: Muzzio, Marina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Naturales y Museo; Argentina. University of Stanford; Estados Unidos. Charles Bronfman Institute of Personalized Medicine; Estados Unidos Fil: Sockell, Alexandra. University of Stanford; Estados Unidos Fil: Bell, Ryan. University of Stanford; Estados Unidos Fil: Cornejo, Omar E.. University of Stanford; Estados Unidos. Washington State University; Estados Unidos Fil: Kelley, Joanna L.. University of Stanford; Estados Unidos. Washington State University; Estados Unidos Fil: Bailliet, Graciela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto Multidisciplinario de Biología Celular. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Instituto Multidisciplinario de Biología Celular. Universidad Nacional de La Plata. Instituto Multidisciplinario de Biología Celular; Argentina Fil: Bravi, Claudio Marcelo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto Multidisciplinario de Biología Celular. Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Instituto Multidisciplinario de Biología Celular. Universidad Nacional de La Plata. Instituto Multidisciplinario de Biología Celular; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Naturales y Museo; Argentina Fil: Bustamante, Carlos D.. University of Stanford; Estados Unidos Fil: Kenny, Eimear. University of Stanford; Estados Unidos. Charles Bronfman Institute of Personalized Medicine; Estados Unidos. Icahn School of Medicine at Mount Sinai; Estados Unidos |
description |
Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth. |
publishDate |
2016 |
dc.date.none.fl_str_mv |
2016-02 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/66656 Cooke, Thomas F.; Yee, Muh-Ching; Muzzio, Marina; Sockell, Alexandra; Bell, Ryan; et al.; GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data; Public Library of Science; Plos Genetics; 12; 2; 2-2016; 1-18 1553-7390 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/66656 |
identifier_str_mv |
Cooke, Thomas F.; Yee, Muh-Ching; Muzzio, Marina; Sockell, Alexandra; Bell, Ryan; et al.; GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data; Public Library of Science; Plos Genetics; 12; 2; 2-2016; 1-18 1553-7390 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/doi/10.1371/journal.pgen.1005631 info:eu-repo/semantics/altIdentifier/url/https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1005631 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Public Library of Science |
publisher.none.fl_str_mv |
Public Library of Science |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1842269200703291392 |
score |
13.13397 |