Separating homeologs by phasing in the tetraploid wheat transcriptome
- Autores
- Krasileva, Ksenia V.; Buffalo, Vince; Bailey, Paul; Pearce, Stephen; Ayling, Sarah; Tabbita, Facundo; Soria, Marcelo Abel; Wang, Shichen; Akhunov, Eduard; Uauy, Cristobal; Dubcovsky, Jorge
- Año de publicación
- 2013
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Fil: Krasileva, Ksenia V. University of California. Dept. Plant Sciences. Davis, CA 9561, USA.
Fil: Buffalo, Vince. University of California. Dept. Plant Sciences. Davis, CA 9561, USA.
Fil: Bailey, Paul. The Genome Analysis Centre. Norwich Research Park. Norwich NR4 7UH, UK.
Fil: Pearce, Stephen. University of California. Dept. Plant Sciences. Davis, CA 9561, USA.
Fil: Ayling, Sarah. The Genome Analysis Centre. Norwich Research Park. Norwich NR4 7UH, UK.
Fil: Tabbita, Facundo. University of California. Dept. Plant Sciences. Davis, CA 9561, USA.
Fil: Soria, Marcelo Abel. University of California. Dept. Plant Sciences. Davis, CA 9561, USA.
Fil: Soria, Marcelo Abel. Universidad de Buenos Aires. Facultad de Agronomía. Departamento de Biología Aplicada y Alimentos. Cátedra de Microbiología Agrícola. Buenos Aires, Argentina.
Fil: Soria, Marcelo Abel. Universidad de Buenos Aires. Facultad de Agronomía. Instituto de Investigaciones en Biociencias Agrícolas y Ambientales (INBA). Buenos Aires, Argentina.
Fil: Soria, Marcelo Abel. CONICET – Universidad de Buenos Aires. Instituto de Investigaciones en Biociencias Agrícolas y Ambientales (INBA). Buenos Aires, Argentina.
Fil: Wang, Shichen. Kansas State University. Department of Plant Pathology. Manhattan, KS 66506, USA.
Fil: Akhunov, Eduard. Kansas State University. Department of Plant Pathology. Manhattan, KS 66506, USA.
Fil: Uauy, Cristobal. John Innes Centre. Norwich Research Park. Norwich NR4 7UH, UK.
Fil: Dubcovsky, Jorge. University of California. Dept. Plant Sciences. Davis, CA 9561, USA.
Fil: Dubcovsky, Jorge. Howard Hughes Medical Institute. Chevy Chase, MD 20815, USA.
Background: The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. Results: A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96 percent of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22 percent relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7 percent of SNPs analyzed are correctly separated by phasing. Conclusions: Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies. - Fuente
- Genome Biology
Vol.14, no.6
14:R66
http://genomebiology.com/ - Materia
-
GENE PREDICTION
PHASING
POLYPLOID
PSEUDOGENES
TRANSCRIPTOME ASSEMBLY
TRITICUM TURGIDUM
TRITICUM URARTU
WHEAT
CONTIG
PROTEOME
TRANSCRIPTOME
CONTROLLED STUDY
DIPLOIDY
GENE SEQUENCE
GENOME
GENOMICS
HETEROZYGOTE
HOMEOLOG
NONHUMAN
OPEN READING FRAME
PLANT GENOME
SINGLE NUCLEOTIDE POLYMORPHISM
TETRAPLOIDY
TRITICUM AESTIVUM
MULTIPLE K-MER ASSEMBLY
PHASING
COMPLEMENTARY DNA - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- acceso abierto
- Repositorio
- Institución
- Universidad de Buenos Aires. Facultad de Agronomía
- OAI Identificador
- snrd:2013krasileva
Ver los metadatos del registro completo
id |
FAUBA_035c0c3ce42cd2e0bcd4cbd07b2da447 |
---|---|
oai_identifier_str |
snrd:2013krasileva |
network_acronym_str |
FAUBA |
repository_id_str |
2729 |
network_name_str |
FAUBA Digital (UBA-FAUBA) |
spelling |
Separating homeologs by phasing in the tetraploid wheat transcriptomeKrasileva, Ksenia V.Buffalo, VinceBailey, PaulPearce, StephenAyling, SarahTabbita, FacundoSoria, Marcelo AbelWang, ShichenAkhunov, EduardUauy, CristobalDubcovsky, JorgeGENE PREDICTIONPHASINGPOLYPLOIDPSEUDOGENESTRANSCRIPTOME ASSEMBLYTRITICUM TURGIDUMTRITICUM URARTUWHEATCONTIGPROTEOMETRANSCRIPTOMECONTROLLED STUDYDIPLOIDYGENE SEQUENCEGENOMEGENOMICSHETEROZYGOTEHOMEOLOGNONHUMANOPEN READING FRAMEPLANT GENOMESINGLE NUCLEOTIDE POLYMORPHISMTETRAPLOIDYTRITICUM AESTIVUMMULTIPLE K-MER ASSEMBLYPHASINGCOMPLEMENTARY DNAFil: Krasileva, Ksenia V. University of California. Dept. Plant Sciences. Davis, CA 9561, USA.Fil: Buffalo, Vince. University of California. Dept. Plant Sciences. Davis, CA 9561, USA.Fil: Bailey, Paul. The Genome Analysis Centre. Norwich Research Park. Norwich NR4 7UH, UK.Fil: Pearce, Stephen. University of California. Dept. Plant Sciences. Davis, CA 9561, USA.Fil: Ayling, Sarah. The Genome Analysis Centre. Norwich Research Park. Norwich NR4 7UH, UK.Fil: Tabbita, Facundo. University of California. Dept. Plant Sciences. Davis, CA 9561, USA.Fil: Soria, Marcelo Abel. University of California. Dept. Plant Sciences. Davis, CA 9561, USA.Fil: Soria, Marcelo Abel. Universidad de Buenos Aires. Facultad de Agronomía. Departamento de Biología Aplicada y Alimentos. Cátedra de Microbiología Agrícola. Buenos Aires, Argentina.Fil: Soria, Marcelo Abel. Universidad de Buenos Aires. Facultad de Agronomía. Instituto de Investigaciones en Biociencias Agrícolas y Ambientales (INBA). Buenos Aires, Argentina.Fil: Soria, Marcelo Abel. CONICET – Universidad de Buenos Aires. Instituto de Investigaciones en Biociencias Agrícolas y Ambientales (INBA). Buenos Aires, Argentina.Fil: Wang, Shichen. Kansas State University. Department of Plant Pathology. Manhattan, KS 66506, USA.Fil: Akhunov, Eduard. Kansas State University. Department of Plant Pathology. Manhattan, KS 66506, USA.Fil: Uauy, Cristobal. John Innes Centre. Norwich Research Park. Norwich NR4 7UH, UK.Fil: Dubcovsky, Jorge. University of California. Dept. Plant Sciences. Davis, CA 9561, USA.Fil: Dubcovsky, Jorge. Howard Hughes Medical Institute. Chevy Chase, MD 20815, USA.Background: The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. Results: A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96 percent of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22 percent relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7 percent of SNPs analyzed are correctly separated by phasing. Conclusions: Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies.2013info:eu-repo/semantics/articlepublishedVersioninfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfdoi:10.1186/gb-2013-14-6-r66issn:1474-760Xhttp://ri.agro.uba.ar/greenstone3/library/collection/arti/document/2013krasilevaGenome BiologyVol.14, no.614:R66http://genomebiology.com/reponame:FAUBA Digital (UBA-FAUBA)instname:Universidad de Buenos Aires. Facultad de Agronomíaenginfo:eu-repo/semantics/openAccessopenAccesshttp://ri.agro.uba.ar/greenstone3/library/page/biblioteca#section42025-09-29T13:41:16Zsnrd:2013krasilevainstacron:UBA-FAUBAInstitucionalhttp://ri.agro.uba.ar/Universidad públicaNo correspondehttp://ri.agro.uba.ar/greenstone3/oaiserver?verb=ListSetsmartino@agro.uba.ar;berasa@agro.uba.ar ArgentinaNo correspondeNo correspondeNo correspondeopendoar:27292025-09-29 13:41:17.362FAUBA Digital (UBA-FAUBA) - Universidad de Buenos Aires. Facultad de Agronomíafalse |
dc.title.none.fl_str_mv |
Separating homeologs by phasing in the tetraploid wheat transcriptome |
title |
Separating homeologs by phasing in the tetraploid wheat transcriptome |
spellingShingle |
Separating homeologs by phasing in the tetraploid wheat transcriptome Krasileva, Ksenia V. GENE PREDICTION PHASING POLYPLOID PSEUDOGENES TRANSCRIPTOME ASSEMBLY TRITICUM TURGIDUM TRITICUM URARTU WHEAT CONTIG PROTEOME TRANSCRIPTOME CONTROLLED STUDY DIPLOIDY GENE SEQUENCE GENOME GENOMICS HETEROZYGOTE HOMEOLOG NONHUMAN OPEN READING FRAME PLANT GENOME SINGLE NUCLEOTIDE POLYMORPHISM TETRAPLOIDY TRITICUM AESTIVUM MULTIPLE K-MER ASSEMBLY PHASING COMPLEMENTARY DNA |
title_short |
Separating homeologs by phasing in the tetraploid wheat transcriptome |
title_full |
Separating homeologs by phasing in the tetraploid wheat transcriptome |
title_fullStr |
Separating homeologs by phasing in the tetraploid wheat transcriptome |
title_full_unstemmed |
Separating homeologs by phasing in the tetraploid wheat transcriptome |
title_sort |
Separating homeologs by phasing in the tetraploid wheat transcriptome |
dc.creator.none.fl_str_mv |
Krasileva, Ksenia V. Buffalo, Vince Bailey, Paul Pearce, Stephen Ayling, Sarah Tabbita, Facundo Soria, Marcelo Abel Wang, Shichen Akhunov, Eduard Uauy, Cristobal Dubcovsky, Jorge |
author |
Krasileva, Ksenia V. |
author_facet |
Krasileva, Ksenia V. Buffalo, Vince Bailey, Paul Pearce, Stephen Ayling, Sarah Tabbita, Facundo Soria, Marcelo Abel Wang, Shichen Akhunov, Eduard Uauy, Cristobal Dubcovsky, Jorge |
author_role |
author |
author2 |
Buffalo, Vince Bailey, Paul Pearce, Stephen Ayling, Sarah Tabbita, Facundo Soria, Marcelo Abel Wang, Shichen Akhunov, Eduard Uauy, Cristobal Dubcovsky, Jorge |
author2_role |
author author author author author author author author author author |
dc.subject.none.fl_str_mv |
GENE PREDICTION PHASING POLYPLOID PSEUDOGENES TRANSCRIPTOME ASSEMBLY TRITICUM TURGIDUM TRITICUM URARTU WHEAT CONTIG PROTEOME TRANSCRIPTOME CONTROLLED STUDY DIPLOIDY GENE SEQUENCE GENOME GENOMICS HETEROZYGOTE HOMEOLOG NONHUMAN OPEN READING FRAME PLANT GENOME SINGLE NUCLEOTIDE POLYMORPHISM TETRAPLOIDY TRITICUM AESTIVUM MULTIPLE K-MER ASSEMBLY PHASING COMPLEMENTARY DNA |
topic |
GENE PREDICTION PHASING POLYPLOID PSEUDOGENES TRANSCRIPTOME ASSEMBLY TRITICUM TURGIDUM TRITICUM URARTU WHEAT CONTIG PROTEOME TRANSCRIPTOME CONTROLLED STUDY DIPLOIDY GENE SEQUENCE GENOME GENOMICS HETEROZYGOTE HOMEOLOG NONHUMAN OPEN READING FRAME PLANT GENOME SINGLE NUCLEOTIDE POLYMORPHISM TETRAPLOIDY TRITICUM AESTIVUM MULTIPLE K-MER ASSEMBLY PHASING COMPLEMENTARY DNA |
dc.description.none.fl_txt_mv |
Fil: Krasileva, Ksenia V. University of California. Dept. Plant Sciences. Davis, CA 9561, USA. Fil: Buffalo, Vince. University of California. Dept. Plant Sciences. Davis, CA 9561, USA. Fil: Bailey, Paul. The Genome Analysis Centre. Norwich Research Park. Norwich NR4 7UH, UK. Fil: Pearce, Stephen. University of California. Dept. Plant Sciences. Davis, CA 9561, USA. Fil: Ayling, Sarah. The Genome Analysis Centre. Norwich Research Park. Norwich NR4 7UH, UK. Fil: Tabbita, Facundo. University of California. Dept. Plant Sciences. Davis, CA 9561, USA. Fil: Soria, Marcelo Abel. University of California. Dept. Plant Sciences. Davis, CA 9561, USA. Fil: Soria, Marcelo Abel. Universidad de Buenos Aires. Facultad de Agronomía. Departamento de Biología Aplicada y Alimentos. Cátedra de Microbiología Agrícola. Buenos Aires, Argentina. Fil: Soria, Marcelo Abel. Universidad de Buenos Aires. Facultad de Agronomía. Instituto de Investigaciones en Biociencias Agrícolas y Ambientales (INBA). Buenos Aires, Argentina. Fil: Soria, Marcelo Abel. CONICET – Universidad de Buenos Aires. Instituto de Investigaciones en Biociencias Agrícolas y Ambientales (INBA). Buenos Aires, Argentina. Fil: Wang, Shichen. Kansas State University. Department of Plant Pathology. Manhattan, KS 66506, USA. Fil: Akhunov, Eduard. Kansas State University. Department of Plant Pathology. Manhattan, KS 66506, USA. Fil: Uauy, Cristobal. John Innes Centre. Norwich Research Park. Norwich NR4 7UH, UK. Fil: Dubcovsky, Jorge. University of California. Dept. Plant Sciences. Davis, CA 9561, USA. Fil: Dubcovsky, Jorge. Howard Hughes Medical Institute. Chevy Chase, MD 20815, USA. Background: The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. Results: A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96 percent of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22 percent relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7 percent of SNPs analyzed are correctly separated by phasing. Conclusions: Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies. |
description |
Fil: Krasileva, Ksenia V. University of California. Dept. Plant Sciences. Davis, CA 9561, USA. |
publishDate |
2013 |
dc.date.none.fl_str_mv |
2013 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article publishedVersion info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
doi:10.1186/gb-2013-14-6-r66 issn:1474-760X http://ri.agro.uba.ar/greenstone3/library/collection/arti/document/2013krasileva |
identifier_str_mv |
doi:10.1186/gb-2013-14-6-r66 issn:1474-760X |
url |
http://ri.agro.uba.ar/greenstone3/library/collection/arti/document/2013krasileva |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess openAccess http://ri.agro.uba.ar/greenstone3/library/page/biblioteca#section4 |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
openAccess http://ri.agro.uba.ar/greenstone3/library/page/biblioteca#section4 |
dc.format.none.fl_str_mv |
application/pdf |
dc.source.none.fl_str_mv |
Genome Biology Vol.14, no.6 14:R66 http://genomebiology.com/ reponame:FAUBA Digital (UBA-FAUBA) instname:Universidad de Buenos Aires. Facultad de Agronomía |
reponame_str |
FAUBA Digital (UBA-FAUBA) |
collection |
FAUBA Digital (UBA-FAUBA) |
instname_str |
Universidad de Buenos Aires. Facultad de Agronomía |
repository.name.fl_str_mv |
FAUBA Digital (UBA-FAUBA) - Universidad de Buenos Aires. Facultad de Agronomía |
repository.mail.fl_str_mv |
martino@agro.uba.ar;berasa@agro.uba.ar |
_version_ |
1844618854471303168 |
score |
13.070432 |