MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data
- Autores
- Kamenetzky, Laura; Stegmayer, Georgina; Maldonado, Lucas Luciano; Macchiaroli, Natalia; Yones, Cristian Ariel; Milone, Diego Humberto
- Año de publicación
- 2016
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- The cestode parasite Echinococcus multilocularis is the aetiological agent of alveolar echinococcosis, responsible for considerable human morbidity and mortality. This disease is a worldwide zoonosis of major public health concern and is considered a neglected disease by the World Health Organization. The complete genome of E. multilocularis has been recently sequenced and assembled in a collaborative effort between the Wellcome Trust Sanger Institute and our group, with the main aim of analyzing protein-coding genes. These analyses suggested that approximately 10% of E. multilocularis genome is composed of protein-coding regions. This shows there is still a vast proportion of the genome that needs to be explored, including non-coding RNAs such as small RNAs (sRNAs). Within this class of small regulatory RNAs, microRNAs (miRNAs) can be found, which have been identified in many different organisms ranging from viruses to higher eukaryotes. MiRNAs are a key regulation mechanism of gene expression at post-transcriptional level and play important roles in biological processes such as development, proliferation, cell differentiation and metabolism in animals and plants. In spite of this, identification of miRNAs directly from genome-wide data only is still a very challenging task. There are many miRNAs that remain unidentified due to the lack of either sequence information of particular phylums or appropriate algorithms to identify novel miRNAs. The motivation for this work is the discovery of new miRNAs in E. multilocularis based on non-target genomic data only, in order to obtain useful information from the currently available unexplored data. In this work, we present the discovery of new pre-miRNAs in the E. multilocularis genome through a novel approach based on machine learning. We have extracted the most commonly used structural features from the folded sequences of the parasite genome: triplets, minimum free energy and sequence length. These features have been used to train a novel deep architecture of self-organizing maps (SOMs). This model can be trained with a high class imbalance and without the artificial definition of a negative class. We discovered 886 pre-miRNA candidates within the E. multilocularis genome-wide data. After that, experimental validation by small RNA-seq analysis clearly showed 23 pre-miRNA candidates with a pattern compatible with miRNA biogenesis, indicating them as high confidence miRNAs. We discovered new pre-miRNA candidates in E. multilocularis using non-target genomic data only. Predictions were meaningful using only sequence data, with no need of RNA-seq data or target analysis for prediction. Furthermore, the methodology employed can be easily adapted and applied on any draft genomes, which are actually the most interesting ones since most non-model organisms have this kind of status and carry real biological and sanitary relevance.
Fil: Kamenetzky, Laura. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; Argentina
Fil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Maldonado, Lucas Luciano. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; Argentina
Fil: Macchiaroli, Natalia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; Argentina
Fil: Yones, Cristian Ariel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina - Materia
-
Machine Learning
Self Organizing Map
Microrna
Echinococcus Multilocularis - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/38958
Ver los metadatos del registro completo
id |
CONICETDig_e154ec1b319d11d56ff723f61b7c7e94 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/38958 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide dataKamenetzky, LauraStegmayer, GeorginaMaldonado, Lucas LucianoMacchiaroli, NataliaYones, Cristian ArielMilone, Diego HumbertoMachine LearningSelf Organizing MapMicrornaEchinococcus Multilocularishttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1https://purl.org/becyt/ford/3.3https://purl.org/becyt/ford/3The cestode parasite Echinococcus multilocularis is the aetiological agent of alveolar echinococcosis, responsible for considerable human morbidity and mortality. This disease is a worldwide zoonosis of major public health concern and is considered a neglected disease by the World Health Organization. The complete genome of E. multilocularis has been recently sequenced and assembled in a collaborative effort between the Wellcome Trust Sanger Institute and our group, with the main aim of analyzing protein-coding genes. These analyses suggested that approximately 10% of E. multilocularis genome is composed of protein-coding regions. This shows there is still a vast proportion of the genome that needs to be explored, including non-coding RNAs such as small RNAs (sRNAs). Within this class of small regulatory RNAs, microRNAs (miRNAs) can be found, which have been identified in many different organisms ranging from viruses to higher eukaryotes. MiRNAs are a key regulation mechanism of gene expression at post-transcriptional level and play important roles in biological processes such as development, proliferation, cell differentiation and metabolism in animals and plants. In spite of this, identification of miRNAs directly from genome-wide data only is still a very challenging task. There are many miRNAs that remain unidentified due to the lack of either sequence information of particular phylums or appropriate algorithms to identify novel miRNAs. The motivation for this work is the discovery of new miRNAs in E. multilocularis based on non-target genomic data only, in order to obtain useful information from the currently available unexplored data. In this work, we present the discovery of new pre-miRNAs in the E. multilocularis genome through a novel approach based on machine learning. We have extracted the most commonly used structural features from the folded sequences of the parasite genome: triplets, minimum free energy and sequence length. These features have been used to train a novel deep architecture of self-organizing maps (SOMs). This model can be trained with a high class imbalance and without the artificial definition of a negative class. We discovered 886 pre-miRNA candidates within the E. multilocularis genome-wide data. After that, experimental validation by small RNA-seq analysis clearly showed 23 pre-miRNA candidates with a pattern compatible with miRNA biogenesis, indicating them as high confidence miRNAs. We discovered new pre-miRNA candidates in E. multilocularis using non-target genomic data only. Predictions were meaningful using only sequence data, with no need of RNA-seq data or target analysis for prediction. Furthermore, the methodology employed can be easily adapted and applied on any draft genomes, which are actually the most interesting ones since most non-model organisms have this kind of status and carry real biological and sanitary relevance.Fil: Kamenetzky, Laura. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Maldonado, Lucas Luciano. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Macchiaroli, Natalia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Yones, Cristian Ariel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaAcademic Press Inc Elsevier Science2016-04info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfapplication/pdfapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/38958Kamenetzky, Laura; Stegmayer, Georgina; Maldonado, Lucas Luciano; Macchiaroli, Natalia; Yones, Cristian Ariel; et al.; MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data; Academic Press Inc Elsevier Science; Genomics; 107; 6; 4-2016; 274-2800888-7543CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0888754316300234info:eu-repo/semantics/altIdentifier/doi/10.1016/j.ygeno.2016.04.002info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-nd/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-10T13:24:33Zoai:ri.conicet.gov.ar:11336/38958instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-10 13:24:33.345CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data |
title |
MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data |
spellingShingle |
MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data Kamenetzky, Laura Machine Learning Self Organizing Map Microrna Echinococcus Multilocularis |
title_short |
MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data |
title_full |
MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data |
title_fullStr |
MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data |
title_full_unstemmed |
MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data |
title_sort |
MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data |
dc.creator.none.fl_str_mv |
Kamenetzky, Laura Stegmayer, Georgina Maldonado, Lucas Luciano Macchiaroli, Natalia Yones, Cristian Ariel Milone, Diego Humberto |
author |
Kamenetzky, Laura |
author_facet |
Kamenetzky, Laura Stegmayer, Georgina Maldonado, Lucas Luciano Macchiaroli, Natalia Yones, Cristian Ariel Milone, Diego Humberto |
author_role |
author |
author2 |
Stegmayer, Georgina Maldonado, Lucas Luciano Macchiaroli, Natalia Yones, Cristian Ariel Milone, Diego Humberto |
author2_role |
author author author author author |
dc.subject.none.fl_str_mv |
Machine Learning Self Organizing Map Microrna Echinococcus Multilocularis |
topic |
Machine Learning Self Organizing Map Microrna Echinococcus Multilocularis |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 https://purl.org/becyt/ford/3.3 https://purl.org/becyt/ford/3 |
dc.description.none.fl_txt_mv |
The cestode parasite Echinococcus multilocularis is the aetiological agent of alveolar echinococcosis, responsible for considerable human morbidity and mortality. This disease is a worldwide zoonosis of major public health concern and is considered a neglected disease by the World Health Organization. The complete genome of E. multilocularis has been recently sequenced and assembled in a collaborative effort between the Wellcome Trust Sanger Institute and our group, with the main aim of analyzing protein-coding genes. These analyses suggested that approximately 10% of E. multilocularis genome is composed of protein-coding regions. This shows there is still a vast proportion of the genome that needs to be explored, including non-coding RNAs such as small RNAs (sRNAs). Within this class of small regulatory RNAs, microRNAs (miRNAs) can be found, which have been identified in many different organisms ranging from viruses to higher eukaryotes. MiRNAs are a key regulation mechanism of gene expression at post-transcriptional level and play important roles in biological processes such as development, proliferation, cell differentiation and metabolism in animals and plants. In spite of this, identification of miRNAs directly from genome-wide data only is still a very challenging task. There are many miRNAs that remain unidentified due to the lack of either sequence information of particular phylums or appropriate algorithms to identify novel miRNAs. The motivation for this work is the discovery of new miRNAs in E. multilocularis based on non-target genomic data only, in order to obtain useful information from the currently available unexplored data. In this work, we present the discovery of new pre-miRNAs in the E. multilocularis genome through a novel approach based on machine learning. We have extracted the most commonly used structural features from the folded sequences of the parasite genome: triplets, minimum free energy and sequence length. These features have been used to train a novel deep architecture of self-organizing maps (SOMs). This model can be trained with a high class imbalance and without the artificial definition of a negative class. We discovered 886 pre-miRNA candidates within the E. multilocularis genome-wide data. After that, experimental validation by small RNA-seq analysis clearly showed 23 pre-miRNA candidates with a pattern compatible with miRNA biogenesis, indicating them as high confidence miRNAs. We discovered new pre-miRNA candidates in E. multilocularis using non-target genomic data only. Predictions were meaningful using only sequence data, with no need of RNA-seq data or target analysis for prediction. Furthermore, the methodology employed can be easily adapted and applied on any draft genomes, which are actually the most interesting ones since most non-model organisms have this kind of status and carry real biological and sanitary relevance. Fil: Kamenetzky, Laura. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; Argentina Fil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina Fil: Maldonado, Lucas Luciano. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; Argentina Fil: Macchiaroli, Natalia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; Argentina Fil: Yones, Cristian Ariel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina |
description |
The cestode parasite Echinococcus multilocularis is the aetiological agent of alveolar echinococcosis, responsible for considerable human morbidity and mortality. This disease is a worldwide zoonosis of major public health concern and is considered a neglected disease by the World Health Organization. The complete genome of E. multilocularis has been recently sequenced and assembled in a collaborative effort between the Wellcome Trust Sanger Institute and our group, with the main aim of analyzing protein-coding genes. These analyses suggested that approximately 10% of E. multilocularis genome is composed of protein-coding regions. This shows there is still a vast proportion of the genome that needs to be explored, including non-coding RNAs such as small RNAs (sRNAs). Within this class of small regulatory RNAs, microRNAs (miRNAs) can be found, which have been identified in many different organisms ranging from viruses to higher eukaryotes. MiRNAs are a key regulation mechanism of gene expression at post-transcriptional level and play important roles in biological processes such as development, proliferation, cell differentiation and metabolism in animals and plants. In spite of this, identification of miRNAs directly from genome-wide data only is still a very challenging task. There are many miRNAs that remain unidentified due to the lack of either sequence information of particular phylums or appropriate algorithms to identify novel miRNAs. The motivation for this work is the discovery of new miRNAs in E. multilocularis based on non-target genomic data only, in order to obtain useful information from the currently available unexplored data. In this work, we present the discovery of new pre-miRNAs in the E. multilocularis genome through a novel approach based on machine learning. We have extracted the most commonly used structural features from the folded sequences of the parasite genome: triplets, minimum free energy and sequence length. These features have been used to train a novel deep architecture of self-organizing maps (SOMs). This model can be trained with a high class imbalance and without the artificial definition of a negative class. We discovered 886 pre-miRNA candidates within the E. multilocularis genome-wide data. After that, experimental validation by small RNA-seq analysis clearly showed 23 pre-miRNA candidates with a pattern compatible with miRNA biogenesis, indicating them as high confidence miRNAs. We discovered new pre-miRNA candidates in E. multilocularis using non-target genomic data only. Predictions were meaningful using only sequence data, with no need of RNA-seq data or target analysis for prediction. Furthermore, the methodology employed can be easily adapted and applied on any draft genomes, which are actually the most interesting ones since most non-model organisms have this kind of status and carry real biological and sanitary relevance. |
publishDate |
2016 |
dc.date.none.fl_str_mv |
2016-04 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/38958 Kamenetzky, Laura; Stegmayer, Georgina; Maldonado, Lucas Luciano; Macchiaroli, Natalia; Yones, Cristian Ariel; et al.; MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data; Academic Press Inc Elsevier Science; Genomics; 107; 6; 4-2016; 274-280 0888-7543 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/38958 |
identifier_str_mv |
Kamenetzky, Laura; Stegmayer, Georgina; Maldonado, Lucas Luciano; Macchiaroli, Natalia; Yones, Cristian Ariel; et al.; MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data; Academic Press Inc Elsevier Science; Genomics; 107; 6; 4-2016; 274-280 0888-7543 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0888754316300234 info:eu-repo/semantics/altIdentifier/doi/10.1016/j.ygeno.2016.04.002 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-nd/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf application/pdf application/pdf application/pdf application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Academic Press Inc Elsevier Science |
publisher.none.fl_str_mv |
Academic Press Inc Elsevier Science |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1842981362594742272 |
score |
12.48226 |