Separating homeologs by phasing in the tetraploid wheat transcriptome
- Autores
- Krasileva, Ksenia V.; Buffalo, Vince; Bailey, Paul; Pearce, Stephen; Ayling, Sarah; Tabbita, Facundo; Soria, Marcelo Abel; Wang, Shichen; Akhunov, Eduard; Uauy, Cristobal; Dubcovsky, Jorge
- Año de publicación
- 2013
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Background: The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. Results: A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96% of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22% relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7% of SNPs analyzed are correctly separated by phasing.Conclusions: Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies.
Fil: Krasileva, Ksenia V.. University of California at Davis; Estados Unidos
Fil: Buffalo, Vince. University of California at Davis; Estados Unidos
Fil: Bailey, Paul. Norwich Research Park; Estados Unidos
Fil: Pearce, Stephen. University of California at Davis; Estados Unidos
Fil: Ayling, Sarah. Norwich Research Park; Estados Unidos
Fil: Tabbita, Facundo. University of California at Davis; Estados Unidos
Fil: Soria, Marcelo Abel. University of California at Davis; Estados Unidos. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones en Biociencias Agrícolas y Ambientales. Universidad de Buenos Aires. Facultad de Agronomía. Instituto de Investigaciones en Biociencias Agrícolas y Ambientales; Argentina
Fil: Wang, Shichen. Kansas State University; Estados Unidos
Fil: Akhunov, Eduard. Kansas State University; Estados Unidos
Fil: Uauy, Cristobal. Norwich Research Park; Estados Unidos
Fil: Dubcovsky, Jorge. University of California at Davis; Estados Unidos. Howard Hughes Medical Institute; Estados Unidos - Materia
-
GENE PREDICTION
MULTIPLE K-MER ASSEMBLY
PHASING
POLYPLOID
PSEUDOGENES
TRANSCRIPTOME ASSEMBLY
TRITICUM TURGIDUM
TRITICUM URARTU
WHEAT - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/86398
Ver los metadatos del registro completo
id |
CONICETDig_0cd4807621dd7902fc4efe0bfc9ef4e8 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/86398 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Separating homeologs by phasing in the tetraploid wheat transcriptomeKrasileva, Ksenia V.Buffalo, VinceBailey, PaulPearce, StephenAyling, SarahTabbita, FacundoSoria, Marcelo AbelWang, ShichenAkhunov, EduardUauy, CristobalDubcovsky, JorgeGENE PREDICTIONMULTIPLE K-MER ASSEMBLYPHASINGPOLYPLOIDPSEUDOGENESTRANSCRIPTOME ASSEMBLYTRITICUM TURGIDUMTRITICUM URARTUWHEAThttps://purl.org/becyt/ford/4.4https://purl.org/becyt/ford/4Background: The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. Results: A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96% of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22% relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7% of SNPs analyzed are correctly separated by phasing.Conclusions: Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies.Fil: Krasileva, Ksenia V.. University of California at Davis; Estados UnidosFil: Buffalo, Vince. University of California at Davis; Estados UnidosFil: Bailey, Paul. Norwich Research Park; Estados UnidosFil: Pearce, Stephen. University of California at Davis; Estados UnidosFil: Ayling, Sarah. Norwich Research Park; Estados UnidosFil: Tabbita, Facundo. University of California at Davis; Estados UnidosFil: Soria, Marcelo Abel. University of California at Davis; Estados Unidos. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones en Biociencias Agrícolas y Ambientales. Universidad de Buenos Aires. Facultad de Agronomía. Instituto de Investigaciones en Biociencias Agrícolas y Ambientales; ArgentinaFil: Wang, Shichen. Kansas State University; Estados UnidosFil: Akhunov, Eduard. Kansas State University; Estados UnidosFil: Uauy, Cristobal. Norwich Research Park; Estados UnidosFil: Dubcovsky, Jorge. University of California at Davis; Estados Unidos. Howard Hughes Medical Institute; Estados UnidosBioMed Central2013-06info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/86398Krasileva, Ksenia V.; Buffalo, Vince; Bailey, Paul; Pearce, Stephen; Ayling, Sarah; et al.; Separating homeologs by phasing in the tetraploid wheat transcriptome; BioMed Central; Genome Biology; 14; 6; 6-2013; 1-191474-760XCONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-6-r66info:eu-repo/semantics/altIdentifier/doi/10.1186/gb-2013-14-6-r66info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:57:40Zoai:ri.conicet.gov.ar:11336/86398instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:57:40.986CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Separating homeologs by phasing in the tetraploid wheat transcriptome |
title |
Separating homeologs by phasing in the tetraploid wheat transcriptome |
spellingShingle |
Separating homeologs by phasing in the tetraploid wheat transcriptome Krasileva, Ksenia V. GENE PREDICTION MULTIPLE K-MER ASSEMBLY PHASING POLYPLOID PSEUDOGENES TRANSCRIPTOME ASSEMBLY TRITICUM TURGIDUM TRITICUM URARTU WHEAT |
title_short |
Separating homeologs by phasing in the tetraploid wheat transcriptome |
title_full |
Separating homeologs by phasing in the tetraploid wheat transcriptome |
title_fullStr |
Separating homeologs by phasing in the tetraploid wheat transcriptome |
title_full_unstemmed |
Separating homeologs by phasing in the tetraploid wheat transcriptome |
title_sort |
Separating homeologs by phasing in the tetraploid wheat transcriptome |
dc.creator.none.fl_str_mv |
Krasileva, Ksenia V. Buffalo, Vince Bailey, Paul Pearce, Stephen Ayling, Sarah Tabbita, Facundo Soria, Marcelo Abel Wang, Shichen Akhunov, Eduard Uauy, Cristobal Dubcovsky, Jorge |
author |
Krasileva, Ksenia V. |
author_facet |
Krasileva, Ksenia V. Buffalo, Vince Bailey, Paul Pearce, Stephen Ayling, Sarah Tabbita, Facundo Soria, Marcelo Abel Wang, Shichen Akhunov, Eduard Uauy, Cristobal Dubcovsky, Jorge |
author_role |
author |
author2 |
Buffalo, Vince Bailey, Paul Pearce, Stephen Ayling, Sarah Tabbita, Facundo Soria, Marcelo Abel Wang, Shichen Akhunov, Eduard Uauy, Cristobal Dubcovsky, Jorge |
author2_role |
author author author author author author author author author author |
dc.subject.none.fl_str_mv |
GENE PREDICTION MULTIPLE K-MER ASSEMBLY PHASING POLYPLOID PSEUDOGENES TRANSCRIPTOME ASSEMBLY TRITICUM TURGIDUM TRITICUM URARTU WHEAT |
topic |
GENE PREDICTION MULTIPLE K-MER ASSEMBLY PHASING POLYPLOID PSEUDOGENES TRANSCRIPTOME ASSEMBLY TRITICUM TURGIDUM TRITICUM URARTU WHEAT |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/4.4 https://purl.org/becyt/ford/4 |
dc.description.none.fl_txt_mv |
Background: The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. Results: A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96% of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22% relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7% of SNPs analyzed are correctly separated by phasing.Conclusions: Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies. Fil: Krasileva, Ksenia V.. University of California at Davis; Estados Unidos Fil: Buffalo, Vince. University of California at Davis; Estados Unidos Fil: Bailey, Paul. Norwich Research Park; Estados Unidos Fil: Pearce, Stephen. University of California at Davis; Estados Unidos Fil: Ayling, Sarah. Norwich Research Park; Estados Unidos Fil: Tabbita, Facundo. University of California at Davis; Estados Unidos Fil: Soria, Marcelo Abel. University of California at Davis; Estados Unidos. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones en Biociencias Agrícolas y Ambientales. Universidad de Buenos Aires. Facultad de Agronomía. Instituto de Investigaciones en Biociencias Agrícolas y Ambientales; Argentina Fil: Wang, Shichen. Kansas State University; Estados Unidos Fil: Akhunov, Eduard. Kansas State University; Estados Unidos Fil: Uauy, Cristobal. Norwich Research Park; Estados Unidos Fil: Dubcovsky, Jorge. University of California at Davis; Estados Unidos. Howard Hughes Medical Institute; Estados Unidos |
description |
Background: The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. Results: A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96% of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22% relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7% of SNPs analyzed are correctly separated by phasing.Conclusions: Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies. |
publishDate |
2013 |
dc.date.none.fl_str_mv |
2013-06 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/86398 Krasileva, Ksenia V.; Buffalo, Vince; Bailey, Paul; Pearce, Stephen; Ayling, Sarah; et al.; Separating homeologs by phasing in the tetraploid wheat transcriptome; BioMed Central; Genome Biology; 14; 6; 6-2013; 1-19 1474-760X CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/86398 |
identifier_str_mv |
Krasileva, Ksenia V.; Buffalo, Vince; Bailey, Paul; Pearce, Stephen; Ayling, Sarah; et al.; Separating homeologs by phasing in the tetraploid wheat transcriptome; BioMed Central; Genome Biology; 14; 6; 6-2013; 1-19 1474-760X CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-6-r66 info:eu-repo/semantics/altIdentifier/doi/10.1186/gb-2013-14-6-r66 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
BioMed Central |
publisher.none.fl_str_mv |
BioMed Central |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844613723975581696 |
score |
13.070432 |