Separating homeologs by phasing in the tetraploid wheat transcriptome

Autores
Krasileva, Ksenia V.; Buffalo, Vince; Bailey, Paul; Pearce, Stephen; Ayling, Sarah; Tabbita, Facundo; Soria, Marcelo Abel; Wang, Shichen; Akhunov, Eduard; Uauy, Cristobal; Dubcovsky, Jorge
Año de publicación
2013
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Background: The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. Results: A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96% of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22% relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7% of SNPs analyzed are correctly separated by phasing.Conclusions: Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies.
Fil: Krasileva, Ksenia V.. University of California at Davis; Estados Unidos
Fil: Buffalo, Vince. University of California at Davis; Estados Unidos
Fil: Bailey, Paul. Norwich Research Park; Estados Unidos
Fil: Pearce, Stephen. University of California at Davis; Estados Unidos
Fil: Ayling, Sarah. Norwich Research Park; Estados Unidos
Fil: Tabbita, Facundo. University of California at Davis; Estados Unidos
Fil: Soria, Marcelo Abel. University of California at Davis; Estados Unidos. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones en Biociencias Agrícolas y Ambientales. Universidad de Buenos Aires. Facultad de Agronomía. Instituto de Investigaciones en Biociencias Agrícolas y Ambientales; Argentina
Fil: Wang, Shichen. Kansas State University; Estados Unidos
Fil: Akhunov, Eduard. Kansas State University; Estados Unidos
Fil: Uauy, Cristobal. Norwich Research Park; Estados Unidos
Fil: Dubcovsky, Jorge. University of California at Davis; Estados Unidos. Howard Hughes Medical Institute; Estados Unidos
Materia
GENE PREDICTION
MULTIPLE K-MER ASSEMBLY
PHASING
POLYPLOID
PSEUDOGENES
TRANSCRIPTOME ASSEMBLY
TRITICUM TURGIDUM
TRITICUM URARTU
WHEAT
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/86398

id CONICETDig_0cd4807621dd7902fc4efe0bfc9ef4e8
oai_identifier_str oai:ri.conicet.gov.ar:11336/86398
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Separating homeologs by phasing in the tetraploid wheat transcriptomeKrasileva, Ksenia V.Buffalo, VinceBailey, PaulPearce, StephenAyling, SarahTabbita, FacundoSoria, Marcelo AbelWang, ShichenAkhunov, EduardUauy, CristobalDubcovsky, JorgeGENE PREDICTIONMULTIPLE K-MER ASSEMBLYPHASINGPOLYPLOIDPSEUDOGENESTRANSCRIPTOME ASSEMBLYTRITICUM TURGIDUMTRITICUM URARTUWHEAThttps://purl.org/becyt/ford/4.4https://purl.org/becyt/ford/4Background: The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. Results: A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96% of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22% relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7% of SNPs analyzed are correctly separated by phasing.Conclusions: Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies.Fil: Krasileva, Ksenia V.. University of California at Davis; Estados UnidosFil: Buffalo, Vince. University of California at Davis; Estados UnidosFil: Bailey, Paul. Norwich Research Park; Estados UnidosFil: Pearce, Stephen. University of California at Davis; Estados UnidosFil: Ayling, Sarah. Norwich Research Park; Estados UnidosFil: Tabbita, Facundo. University of California at Davis; Estados UnidosFil: Soria, Marcelo Abel. University of California at Davis; Estados Unidos. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones en Biociencias Agrícolas y Ambientales. Universidad de Buenos Aires. Facultad de Agronomía. Instituto de Investigaciones en Biociencias Agrícolas y Ambientales; ArgentinaFil: Wang, Shichen. Kansas State University; Estados UnidosFil: Akhunov, Eduard. Kansas State University; Estados UnidosFil: Uauy, Cristobal. Norwich Research Park; Estados UnidosFil: Dubcovsky, Jorge. University of California at Davis; Estados Unidos. Howard Hughes Medical Institute; Estados UnidosBioMed Central2013-06info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/86398Krasileva, Ksenia V.; Buffalo, Vince; Bailey, Paul; Pearce, Stephen; Ayling, Sarah; et al.; Separating homeologs by phasing in the tetraploid wheat transcriptome; BioMed Central; Genome Biology; 14; 6; 6-2013; 1-191474-760XCONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-6-r66info:eu-repo/semantics/altIdentifier/doi/10.1186/gb-2013-14-6-r66info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:57:40Zoai:ri.conicet.gov.ar:11336/86398instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:57:40.986CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Separating homeologs by phasing in the tetraploid wheat transcriptome
title Separating homeologs by phasing in the tetraploid wheat transcriptome
spellingShingle Separating homeologs by phasing in the tetraploid wheat transcriptome
Krasileva, Ksenia V.
GENE PREDICTION
MULTIPLE K-MER ASSEMBLY
PHASING
POLYPLOID
PSEUDOGENES
TRANSCRIPTOME ASSEMBLY
TRITICUM TURGIDUM
TRITICUM URARTU
WHEAT
title_short Separating homeologs by phasing in the tetraploid wheat transcriptome
title_full Separating homeologs by phasing in the tetraploid wheat transcriptome
title_fullStr Separating homeologs by phasing in the tetraploid wheat transcriptome
title_full_unstemmed Separating homeologs by phasing in the tetraploid wheat transcriptome
title_sort Separating homeologs by phasing in the tetraploid wheat transcriptome
dc.creator.none.fl_str_mv Krasileva, Ksenia V.
Buffalo, Vince
Bailey, Paul
Pearce, Stephen
Ayling, Sarah
Tabbita, Facundo
Soria, Marcelo Abel
Wang, Shichen
Akhunov, Eduard
Uauy, Cristobal
Dubcovsky, Jorge
author Krasileva, Ksenia V.
author_facet Krasileva, Ksenia V.
Buffalo, Vince
Bailey, Paul
Pearce, Stephen
Ayling, Sarah
Tabbita, Facundo
Soria, Marcelo Abel
Wang, Shichen
Akhunov, Eduard
Uauy, Cristobal
Dubcovsky, Jorge
author_role author
author2 Buffalo, Vince
Bailey, Paul
Pearce, Stephen
Ayling, Sarah
Tabbita, Facundo
Soria, Marcelo Abel
Wang, Shichen
Akhunov, Eduard
Uauy, Cristobal
Dubcovsky, Jorge
author2_role author
author
author
author
author
author
author
author
author
author
dc.subject.none.fl_str_mv GENE PREDICTION
MULTIPLE K-MER ASSEMBLY
PHASING
POLYPLOID
PSEUDOGENES
TRANSCRIPTOME ASSEMBLY
TRITICUM TURGIDUM
TRITICUM URARTU
WHEAT
topic GENE PREDICTION
MULTIPLE K-MER ASSEMBLY
PHASING
POLYPLOID
PSEUDOGENES
TRANSCRIPTOME ASSEMBLY
TRITICUM TURGIDUM
TRITICUM URARTU
WHEAT
purl_subject.fl_str_mv https://purl.org/becyt/ford/4.4
https://purl.org/becyt/ford/4
dc.description.none.fl_txt_mv Background: The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. Results: A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96% of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22% relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7% of SNPs analyzed are correctly separated by phasing.Conclusions: Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies.
Fil: Krasileva, Ksenia V.. University of California at Davis; Estados Unidos
Fil: Buffalo, Vince. University of California at Davis; Estados Unidos
Fil: Bailey, Paul. Norwich Research Park; Estados Unidos
Fil: Pearce, Stephen. University of California at Davis; Estados Unidos
Fil: Ayling, Sarah. Norwich Research Park; Estados Unidos
Fil: Tabbita, Facundo. University of California at Davis; Estados Unidos
Fil: Soria, Marcelo Abel. University of California at Davis; Estados Unidos. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones en Biociencias Agrícolas y Ambientales. Universidad de Buenos Aires. Facultad de Agronomía. Instituto de Investigaciones en Biociencias Agrícolas y Ambientales; Argentina
Fil: Wang, Shichen. Kansas State University; Estados Unidos
Fil: Akhunov, Eduard. Kansas State University; Estados Unidos
Fil: Uauy, Cristobal. Norwich Research Park; Estados Unidos
Fil: Dubcovsky, Jorge. University of California at Davis; Estados Unidos. Howard Hughes Medical Institute; Estados Unidos
description Background: The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. Results: A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96% of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22% relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7% of SNPs analyzed are correctly separated by phasing.Conclusions: Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies.
publishDate 2013
dc.date.none.fl_str_mv 2013-06
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/86398
Krasileva, Ksenia V.; Buffalo, Vince; Bailey, Paul; Pearce, Stephen; Ayling, Sarah; et al.; Separating homeologs by phasing in the tetraploid wheat transcriptome; BioMed Central; Genome Biology; 14; 6; 6-2013; 1-19
1474-760X
CONICET Digital
CONICET
url http://hdl.handle.net/11336/86398
identifier_str_mv Krasileva, Ksenia V.; Buffalo, Vince; Bailey, Paul; Pearce, Stephen; Ayling, Sarah; et al.; Separating homeologs by phasing in the tetraploid wheat transcriptome; BioMed Central; Genome Biology; 14; 6; 6-2013; 1-19
1474-760X
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-6-r66
info:eu-repo/semantics/altIdentifier/doi/10.1186/gb-2013-14-6-r66
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv BioMed Central
publisher.none.fl_str_mv BioMed Central
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844613723975581696
score 13.070432