GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
- Autores
- Cooke, T. F.; Yee, M.-C.; Muzzio, Marina; Sockell, A.; Bell, R.; Cornejo, O. E.; Kelley, J. L.; Bailliet, Graciela; Bravi, Claudio Marcelo; Bustamante, Carlos D.; Kenny, E. E.
- Año de publicación
- 2016
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth.
Facultad de Ciencias Naturales y Museo
Instituto Multidisciplinario de Biología Celular - Materia
-
Ciencias Exactas
GBS
genetic variation - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by/4.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/87069
Ver los metadatos del registro completo
id |
SEDICI_be9c2e44c7d5fbb5ffe6fe20a0708cbb |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/87069 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing DataCooke, T. F.Yee, M.-C.Muzzio, MarinaSockell, A.Bell, R.Cornejo, O. E.Kelley, J. L.Bailliet, GracielaBravi, Claudio MarceloBustamante, Carlos D.Kenny, E. E.Ciencias ExactasGBSgenetic variationReduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth.Facultad de Ciencias Naturales y MuseoInstituto Multidisciplinario de Biología Celular2016info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionArticulohttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/87069enginfo:eu-repo/semantics/altIdentifier/issn/1553-7390info:eu-repo/semantics/altIdentifier/doi/10.1371/journal.pgen.1005631info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by/4.0/Creative Commons Attribution 4.0 International (CC BY 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-10T12:19:28Zoai:sedici.unlp.edu.ar:10915/87069Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-10 12:19:28.858SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data |
title |
GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data |
spellingShingle |
GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data Cooke, T. F. Ciencias Exactas GBS genetic variation |
title_short |
GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data |
title_full |
GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data |
title_fullStr |
GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data |
title_full_unstemmed |
GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data |
title_sort |
GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data |
dc.creator.none.fl_str_mv |
Cooke, T. F. Yee, M.-C. Muzzio, Marina Sockell, A. Bell, R. Cornejo, O. E. Kelley, J. L. Bailliet, Graciela Bravi, Claudio Marcelo Bustamante, Carlos D. Kenny, E. E. |
author |
Cooke, T. F. |
author_facet |
Cooke, T. F. Yee, M.-C. Muzzio, Marina Sockell, A. Bell, R. Cornejo, O. E. Kelley, J. L. Bailliet, Graciela Bravi, Claudio Marcelo Bustamante, Carlos D. Kenny, E. E. |
author_role |
author |
author2 |
Yee, M.-C. Muzzio, Marina Sockell, A. Bell, R. Cornejo, O. E. Kelley, J. L. Bailliet, Graciela Bravi, Claudio Marcelo Bustamante, Carlos D. Kenny, E. E. |
author2_role |
author author author author author author author author author author |
dc.subject.none.fl_str_mv |
Ciencias Exactas GBS genetic variation |
topic |
Ciencias Exactas GBS genetic variation |
dc.description.none.fl_txt_mv |
Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth. Facultad de Ciencias Naturales y Museo Instituto Multidisciplinario de Biología Celular |
description |
Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth. |
publishDate |
2016 |
dc.date.none.fl_str_mv |
2016 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion Articulo http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/87069 |
url |
http://sedici.unlp.edu.ar/handle/10915/87069 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/issn/1553-7390 info:eu-repo/semantics/altIdentifier/doi/10.1371/journal.pgen.1005631 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/4.0/ Creative Commons Attribution 4.0 International (CC BY 4.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by/4.0/ Creative Commons Attribution 4.0 International (CC BY 4.0) |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1842904184856248320 |
score |
12.993085 |