GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data

Autores
Cooke, T. F.; Yee, M.-C.; Muzzio, Marina; Sockell, A.; Bell, R.; Cornejo, O. E.; Kelley, J. L.; Bailliet, Graciela; Bravi, Claudio Marcelo; Bustamante, Carlos D.; Kenny, E. E.
Año de publicación
2016
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth.
Facultad de Ciencias Naturales y Museo
Instituto Multidisciplinario de Biología Celular
Materia
Ciencias Exactas
GBS
genetic variation
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/87069

id SEDICI_be9c2e44c7d5fbb5ffe6fe20a0708cbb
oai_identifier_str oai:sedici.unlp.edu.ar:10915/87069
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing DataCooke, T. F.Yee, M.-C.Muzzio, MarinaSockell, A.Bell, R.Cornejo, O. E.Kelley, J. L.Bailliet, GracielaBravi, Claudio MarceloBustamante, Carlos D.Kenny, E. E.Ciencias ExactasGBSgenetic variationReduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth.Facultad de Ciencias Naturales y MuseoInstituto Multidisciplinario de Biología Celular2016info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionArticulohttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/87069enginfo:eu-repo/semantics/altIdentifier/issn/1553-7390info:eu-repo/semantics/altIdentifier/doi/10.1371/journal.pgen.1005631info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by/4.0/Creative Commons Attribution 4.0 International (CC BY 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-10T12:19:28Zoai:sedici.unlp.edu.ar:10915/87069Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-10 12:19:28.858SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
title GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
spellingShingle GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
Cooke, T. F.
Ciencias Exactas
GBS
genetic variation
title_short GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
title_full GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
title_fullStr GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
title_full_unstemmed GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
title_sort GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
dc.creator.none.fl_str_mv Cooke, T. F.
Yee, M.-C.
Muzzio, Marina
Sockell, A.
Bell, R.
Cornejo, O. E.
Kelley, J. L.
Bailliet, Graciela
Bravi, Claudio Marcelo
Bustamante, Carlos D.
Kenny, E. E.
author Cooke, T. F.
author_facet Cooke, T. F.
Yee, M.-C.
Muzzio, Marina
Sockell, A.
Bell, R.
Cornejo, O. E.
Kelley, J. L.
Bailliet, Graciela
Bravi, Claudio Marcelo
Bustamante, Carlos D.
Kenny, E. E.
author_role author
author2 Yee, M.-C.
Muzzio, Marina
Sockell, A.
Bell, R.
Cornejo, O. E.
Kelley, J. L.
Bailliet, Graciela
Bravi, Claudio Marcelo
Bustamante, Carlos D.
Kenny, E. E.
author2_role author
author
author
author
author
author
author
author
author
author
dc.subject.none.fl_str_mv Ciencias Exactas
GBS
genetic variation
topic Ciencias Exactas
GBS
genetic variation
dc.description.none.fl_txt_mv Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth.
Facultad de Ciencias Naturales y Museo
Instituto Multidisciplinario de Biología Celular
description Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth.
publishDate 2016
dc.date.none.fl_str_mv 2016
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
Articulo
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/87069
url http://sedici.unlp.edu.ar/handle/10915/87069
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/issn/1553-7390
info:eu-repo/semantics/altIdentifier/doi/10.1371/journal.pgen.1005631
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by/4.0/
Creative Commons Attribution 4.0 International (CC BY 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by/4.0/
Creative Commons Attribution 4.0 International (CC BY 4.0)
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1842904184856248320
score 12.993085