Quality control of genotypes using heritability estimates of gene content at the marker

Autores
Forneris, Natalia Soledad; Legarra, Andrés L.; Vitezica, Zulma G.; Tsuruta, Shogo; Aguilar, Ignacio; Misztal, Ignacy; Cantet, Rodolfo Juan Carlos
Año de publicación
2015
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Quality control filtering of single nucleotide polymorphisms (SNP) is a key step when analyzing genomic data. Here, we present a practical method to identify low-quality SNPs, meaning markers whose genotypes are wrongly assigned for a large proportion of individuals, by estimating the heritability of gene content at each marker, where gene content is the number of copies of a particular reference allele in a genotype of an animal (0, 1 or 2). If there is no mutation at the marker, gene content has an additive heritability of 1 by construction. The method uses Restricted Maximum Likelihood to estimate heritability of gene content at each SNP and also builds a likelihood ratio test statistic to test for zero error variance in genotyping. As a byproduct, estimates of the allele frequencies of markers at the base population are obtained. Using simulated data with 10% permutation error (4% actual error) in genotyping, the method had a specificity of 96% (4% of correct markers are rejected) and a sensitivity of 0.99 (1% of wrong markers are accepted) if markers with heritability lower than 0.975 are discarded. Checking of Mendelian errors resulted in a lower sensitivity (0.84) for the same simulation. The proposed method is further illustrated with a real dataset with genotypes from 3,534 animals genotyped for 50,433 markers from the Illumina PorcineSNP60 chip, and a pedigree of 6,473 individuals; those markers did undergo very little quality control. A number of 4,099 markers with p-values lower than 0.01 were discarded based on our method, with associated estimates of heritability as low as 0.12. Contrary to other techniques, our method uses simultaneously all information in the population, can be used in any population with markers and pedigree recordings, and is simple to implement using standard software for REML estimation. Scripts for its use are provided.
Fil: Forneris, Natalia Soledad. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario; Argentina. Universidad de Buenos Aires. Facultad de Agronomía. Departamento de Producción Animal. Cátedra de Mejoramiento Genético Animal; Argentina
Fil: Legarra, Andrés L.. Institut National de la Recherche Agronomique; Francia. Universite de Toulose - Le Mirail; Francia
Fil: Vitezica, Zulma G.. Universite de Toulose - Le Mirail; Francia. Institut National de la Recherche Agronomique; Francia
Fil: Tsuruta, Shogo. University of Georgia; Estados Unidos
Fil: Aguilar, Ignacio. Instituto Nacional de Investigación Agropecuaria; Uruguay
Fil: Misztal, Ignacy. University of Georgia; Estados Unidos
Fil: Cantet, Rodolfo Juan Carlos. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario; Argentina. Universidad de Buenos Aires. Facultad de Agronomía. Departamento de Producción Animal. Cátedra de Mejoramiento Genético Animal; Argentina
Materia
Gene Content
Quality Control
Snp
Genomic Selection
Reml
Shared Data Resource
Genpred
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/44365

id CONICETDig_c4544f6bc089e3aa7206b3b0d705ec01
oai_identifier_str oai:ri.conicet.gov.ar:11336/44365
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Quality control of genotypes using heritability estimates of gene content at the markerForneris, Natalia SoledadLegarra, Andrés L.Vitezica, Zulma G.Tsuruta, ShogoAguilar, IgnacioMisztal, IgnacyCantet, Rodolfo Juan CarlosGene ContentQuality ControlSnpGenomic SelectionRemlShared Data ResourceGenpredhttps://purl.org/becyt/ford/4.2https://purl.org/becyt/ford/4Quality control filtering of single nucleotide polymorphisms (SNP) is a key step when analyzing genomic data. Here, we present a practical method to identify low-quality SNPs, meaning markers whose genotypes are wrongly assigned for a large proportion of individuals, by estimating the heritability of gene content at each marker, where gene content is the number of copies of a particular reference allele in a genotype of an animal (0, 1 or 2). If there is no mutation at the marker, gene content has an additive heritability of 1 by construction. The method uses Restricted Maximum Likelihood to estimate heritability of gene content at each SNP and also builds a likelihood ratio test statistic to test for zero error variance in genotyping. As a byproduct, estimates of the allele frequencies of markers at the base population are obtained. Using simulated data with 10% permutation error (4% actual error) in genotyping, the method had a specificity of 96% (4% of correct markers are rejected) and a sensitivity of 0.99 (1% of wrong markers are accepted) if markers with heritability lower than 0.975 are discarded. Checking of Mendelian errors resulted in a lower sensitivity (0.84) for the same simulation. The proposed method is further illustrated with a real dataset with genotypes from 3,534 animals genotyped for 50,433 markers from the Illumina PorcineSNP60 chip, and a pedigree of 6,473 individuals; those markers did undergo very little quality control. A number of 4,099 markers with p-values lower than 0.01 were discarded based on our method, with associated estimates of heritability as low as 0.12. Contrary to other techniques, our method uses simultaneously all information in the population, can be used in any population with markers and pedigree recordings, and is simple to implement using standard software for REML estimation. Scripts for its use are provided.Fil: Forneris, Natalia Soledad. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario; Argentina. Universidad de Buenos Aires. Facultad de Agronomía. Departamento de Producción Animal. Cátedra de Mejoramiento Genético Animal; ArgentinaFil: Legarra, Andrés L.. Institut National de la Recherche Agronomique; Francia. Universite de Toulose - Le Mirail; FranciaFil: Vitezica, Zulma G.. Universite de Toulose - Le Mirail; Francia. Institut National de la Recherche Agronomique; FranciaFil: Tsuruta, Shogo. University of Georgia; Estados UnidosFil: Aguilar, Ignacio. Instituto Nacional de Investigación Agropecuaria; UruguayFil: Misztal, Ignacy. University of Georgia; Estados UnidosFil: Cantet, Rodolfo Juan Carlos. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario; Argentina. Universidad de Buenos Aires. Facultad de Agronomía. Departamento de Producción Animal. Cátedra de Mejoramiento Genético Animal; ArgentinaGenetics Society of America2015-03info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/44365Forneris, Natalia Soledad; Legarra, Andrés L.; Vitezica, Zulma G.; Tsuruta, Shogo; Aguilar, Ignacio; et al.; Quality control of genotypes using heritability estimates of gene content at the marker; Genetics Society of America; Genetics; 199; 3; 3-2015; 675-6810016-67311943-2631CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/http://www.genetics.org/content/199/3/675info:eu-repo/semantics/altIdentifier/doi/10.1534/genetics.114.173559info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:41:54Zoai:ri.conicet.gov.ar:11336/44365instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:41:54.777CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Quality control of genotypes using heritability estimates of gene content at the marker
title Quality control of genotypes using heritability estimates of gene content at the marker
spellingShingle Quality control of genotypes using heritability estimates of gene content at the marker
Forneris, Natalia Soledad
Gene Content
Quality Control
Snp
Genomic Selection
Reml
Shared Data Resource
Genpred
title_short Quality control of genotypes using heritability estimates of gene content at the marker
title_full Quality control of genotypes using heritability estimates of gene content at the marker
title_fullStr Quality control of genotypes using heritability estimates of gene content at the marker
title_full_unstemmed Quality control of genotypes using heritability estimates of gene content at the marker
title_sort Quality control of genotypes using heritability estimates of gene content at the marker
dc.creator.none.fl_str_mv Forneris, Natalia Soledad
Legarra, Andrés L.
Vitezica, Zulma G.
Tsuruta, Shogo
Aguilar, Ignacio
Misztal, Ignacy
Cantet, Rodolfo Juan Carlos
author Forneris, Natalia Soledad
author_facet Forneris, Natalia Soledad
Legarra, Andrés L.
Vitezica, Zulma G.
Tsuruta, Shogo
Aguilar, Ignacio
Misztal, Ignacy
Cantet, Rodolfo Juan Carlos
author_role author
author2 Legarra, Andrés L.
Vitezica, Zulma G.
Tsuruta, Shogo
Aguilar, Ignacio
Misztal, Ignacy
Cantet, Rodolfo Juan Carlos
author2_role author
author
author
author
author
author
dc.subject.none.fl_str_mv Gene Content
Quality Control
Snp
Genomic Selection
Reml
Shared Data Resource
Genpred
topic Gene Content
Quality Control
Snp
Genomic Selection
Reml
Shared Data Resource
Genpred
purl_subject.fl_str_mv https://purl.org/becyt/ford/4.2
https://purl.org/becyt/ford/4
dc.description.none.fl_txt_mv Quality control filtering of single nucleotide polymorphisms (SNP) is a key step when analyzing genomic data. Here, we present a practical method to identify low-quality SNPs, meaning markers whose genotypes are wrongly assigned for a large proportion of individuals, by estimating the heritability of gene content at each marker, where gene content is the number of copies of a particular reference allele in a genotype of an animal (0, 1 or 2). If there is no mutation at the marker, gene content has an additive heritability of 1 by construction. The method uses Restricted Maximum Likelihood to estimate heritability of gene content at each SNP and also builds a likelihood ratio test statistic to test for zero error variance in genotyping. As a byproduct, estimates of the allele frequencies of markers at the base population are obtained. Using simulated data with 10% permutation error (4% actual error) in genotyping, the method had a specificity of 96% (4% of correct markers are rejected) and a sensitivity of 0.99 (1% of wrong markers are accepted) if markers with heritability lower than 0.975 are discarded. Checking of Mendelian errors resulted in a lower sensitivity (0.84) for the same simulation. The proposed method is further illustrated with a real dataset with genotypes from 3,534 animals genotyped for 50,433 markers from the Illumina PorcineSNP60 chip, and a pedigree of 6,473 individuals; those markers did undergo very little quality control. A number of 4,099 markers with p-values lower than 0.01 were discarded based on our method, with associated estimates of heritability as low as 0.12. Contrary to other techniques, our method uses simultaneously all information in the population, can be used in any population with markers and pedigree recordings, and is simple to implement using standard software for REML estimation. Scripts for its use are provided.
Fil: Forneris, Natalia Soledad. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario; Argentina. Universidad de Buenos Aires. Facultad de Agronomía. Departamento de Producción Animal. Cátedra de Mejoramiento Genético Animal; Argentina
Fil: Legarra, Andrés L.. Institut National de la Recherche Agronomique; Francia. Universite de Toulose - Le Mirail; Francia
Fil: Vitezica, Zulma G.. Universite de Toulose - Le Mirail; Francia. Institut National de la Recherche Agronomique; Francia
Fil: Tsuruta, Shogo. University of Georgia; Estados Unidos
Fil: Aguilar, Ignacio. Instituto Nacional de Investigación Agropecuaria; Uruguay
Fil: Misztal, Ignacy. University of Georgia; Estados Unidos
Fil: Cantet, Rodolfo Juan Carlos. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario; Argentina. Universidad de Buenos Aires. Facultad de Agronomía. Departamento de Producción Animal. Cátedra de Mejoramiento Genético Animal; Argentina
description Quality control filtering of single nucleotide polymorphisms (SNP) is a key step when analyzing genomic data. Here, we present a practical method to identify low-quality SNPs, meaning markers whose genotypes are wrongly assigned for a large proportion of individuals, by estimating the heritability of gene content at each marker, where gene content is the number of copies of a particular reference allele in a genotype of an animal (0, 1 or 2). If there is no mutation at the marker, gene content has an additive heritability of 1 by construction. The method uses Restricted Maximum Likelihood to estimate heritability of gene content at each SNP and also builds a likelihood ratio test statistic to test for zero error variance in genotyping. As a byproduct, estimates of the allele frequencies of markers at the base population are obtained. Using simulated data with 10% permutation error (4% actual error) in genotyping, the method had a specificity of 96% (4% of correct markers are rejected) and a sensitivity of 0.99 (1% of wrong markers are accepted) if markers with heritability lower than 0.975 are discarded. Checking of Mendelian errors resulted in a lower sensitivity (0.84) for the same simulation. The proposed method is further illustrated with a real dataset with genotypes from 3,534 animals genotyped for 50,433 markers from the Illumina PorcineSNP60 chip, and a pedigree of 6,473 individuals; those markers did undergo very little quality control. A number of 4,099 markers with p-values lower than 0.01 were discarded based on our method, with associated estimates of heritability as low as 0.12. Contrary to other techniques, our method uses simultaneously all information in the population, can be used in any population with markers and pedigree recordings, and is simple to implement using standard software for REML estimation. Scripts for its use are provided.
publishDate 2015
dc.date.none.fl_str_mv 2015-03
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/44365
Forneris, Natalia Soledad; Legarra, Andrés L.; Vitezica, Zulma G.; Tsuruta, Shogo; Aguilar, Ignacio; et al.; Quality control of genotypes using heritability estimates of gene content at the marker; Genetics Society of America; Genetics; 199; 3; 3-2015; 675-681
0016-6731
1943-2631
CONICET Digital
CONICET
url http://hdl.handle.net/11336/44365
identifier_str_mv Forneris, Natalia Soledad; Legarra, Andrés L.; Vitezica, Zulma G.; Tsuruta, Shogo; Aguilar, Ignacio; et al.; Quality control of genotypes using heritability estimates of gene content at the marker; Genetics Society of America; Genetics; 199; 3; 3-2015; 675-681
0016-6731
1943-2631
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/http://www.genetics.org/content/199/3/675
info:eu-repo/semantics/altIdentifier/doi/10.1534/genetics.114.173559
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv Genetics Society of America
publisher.none.fl_str_mv Genetics Society of America
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844613321170354176
score 13.070432