MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data

Autores
Kamenetzky, Laura; Stegmayer, Georgina; Maldonado, Lucas Luciano; Macchiaroli, Natalia; Yones, Cristian Ariel; Milone, Diego Humberto
Año de publicación
2016
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
The cestode parasite Echinococcus multilocularis is the aetiological agent of alveolar echinococcosis, responsible for considerable human morbidity and mortality. This disease is a worldwide zoonosis of major public health concern and is considered a neglected disease by the World Health Organization. The complete genome of E. multilocularis has been recently sequenced and assembled in a collaborative effort between the Wellcome Trust Sanger Institute and our group, with the main aim of analyzing protein-coding genes. These analyses suggested that approximately 10% of E. multilocularis genome is composed of protein-coding regions. This shows there is still a vast proportion of the genome that needs to be explored, including non-coding RNAs such as small RNAs (sRNAs). Within this class of small regulatory RNAs, microRNAs (miRNAs) can be found, which have been identified in many different organisms ranging from viruses to higher eukaryotes. MiRNAs are a key regulation mechanism of gene expression at post-transcriptional level and play important roles in biological processes such as development, proliferation, cell differentiation and metabolism in animals and plants. In spite of this, identification of miRNAs directly from genome-wide data only is still a very challenging task. There are many miRNAs that remain unidentified due to the lack of either sequence information of particular phylums or appropriate algorithms to identify novel miRNAs. The motivation for this work is the discovery of new miRNAs in E. multilocularis based on non-target genomic data only, in order to obtain useful information from the currently available unexplored data. In this work, we present the discovery of new pre-miRNAs in the E. multilocularis genome through a novel approach based on machine learning. We have extracted the most commonly used structural features from the folded sequences of the parasite genome: triplets, minimum free energy and sequence length. These features have been used to train a novel deep architecture of self-organizing maps (SOMs). This model can be trained with a high class imbalance and without the artificial definition of a negative class. We discovered 886 pre-miRNA candidates within the E. multilocularis genome-wide data. After that, experimental validation by small RNA-seq analysis clearly showed 23 pre-miRNA candidates with a pattern compatible with miRNA biogenesis, indicating them as high confidence miRNAs. We discovered new pre-miRNA candidates in E. multilocularis using non-target genomic data only. Predictions were meaningful using only sequence data, with no need of RNA-seq data or target analysis for prediction. Furthermore, the methodology employed can be easily adapted and applied on any draft genomes, which are actually the most interesting ones since most non-model organisms have this kind of status and carry real biological and sanitary relevance.
Fil: Kamenetzky, Laura. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; Argentina
Fil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Maldonado, Lucas Luciano. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; Argentina
Fil: Macchiaroli, Natalia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; Argentina
Fil: Yones, Cristian Ariel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Materia
Machine Learning
Self Organizing Map
Microrna
Echinococcus Multilocularis
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/38958

id CONICETDig_e154ec1b319d11d56ff723f61b7c7e94
oai_identifier_str oai:ri.conicet.gov.ar:11336/38958
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide dataKamenetzky, LauraStegmayer, GeorginaMaldonado, Lucas LucianoMacchiaroli, NataliaYones, Cristian ArielMilone, Diego HumbertoMachine LearningSelf Organizing MapMicrornaEchinococcus Multilocularishttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1https://purl.org/becyt/ford/3.3https://purl.org/becyt/ford/3The cestode parasite Echinococcus multilocularis is the aetiological agent of alveolar echinococcosis, responsible for considerable human morbidity and mortality. This disease is a worldwide zoonosis of major public health concern and is considered a neglected disease by the World Health Organization. The complete genome of E. multilocularis has been recently sequenced and assembled in a collaborative effort between the Wellcome Trust Sanger Institute and our group, with the main aim of analyzing protein-coding genes. These analyses suggested that approximately 10% of E. multilocularis genome is composed of protein-coding regions. This shows there is still a vast proportion of the genome that needs to be explored, including non-coding RNAs such as small RNAs (sRNAs). Within this class of small regulatory RNAs, microRNAs (miRNAs) can be found, which have been identified in many different organisms ranging from viruses to higher eukaryotes. MiRNAs are a key regulation mechanism of gene expression at post-transcriptional level and play important roles in biological processes such as development, proliferation, cell differentiation and metabolism in animals and plants. In spite of this, identification of miRNAs directly from genome-wide data only is still a very challenging task. There are many miRNAs that remain unidentified due to the lack of either sequence information of particular phylums or appropriate algorithms to identify novel miRNAs. The motivation for this work is the discovery of new miRNAs in E. multilocularis based on non-target genomic data only, in order to obtain useful information from the currently available unexplored data. In this work, we present the discovery of new pre-miRNAs in the E. multilocularis genome through a novel approach based on machine learning. We have extracted the most commonly used structural features from the folded sequences of the parasite genome: triplets, minimum free energy and sequence length. These features have been used to train a novel deep architecture of self-organizing maps (SOMs). This model can be trained with a high class imbalance and without the artificial definition of a negative class. We discovered 886 pre-miRNA candidates within the E. multilocularis genome-wide data. After that, experimental validation by small RNA-seq analysis clearly showed 23 pre-miRNA candidates with a pattern compatible with miRNA biogenesis, indicating them as high confidence miRNAs. We discovered new pre-miRNA candidates in E. multilocularis using non-target genomic data only. Predictions were meaningful using only sequence data, with no need of RNA-seq data or target analysis for prediction. Furthermore, the methodology employed can be easily adapted and applied on any draft genomes, which are actually the most interesting ones since most non-model organisms have this kind of status and carry real biological and sanitary relevance.Fil: Kamenetzky, Laura. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Maldonado, Lucas Luciano. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Macchiaroli, Natalia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; ArgentinaFil: Yones, Cristian Ariel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaAcademic Press Inc Elsevier Science2016-04info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfapplication/pdfapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/38958Kamenetzky, Laura; Stegmayer, Georgina; Maldonado, Lucas Luciano; Macchiaroli, Natalia; Yones, Cristian Ariel; et al.; MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data; Academic Press Inc Elsevier Science; Genomics; 107; 6; 4-2016; 274-2800888-7543CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0888754316300234info:eu-repo/semantics/altIdentifier/doi/10.1016/j.ygeno.2016.04.002info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-nd/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-10T13:24:33Zoai:ri.conicet.gov.ar:11336/38958instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-10 13:24:33.345CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data
title MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data
spellingShingle MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data
Kamenetzky, Laura
Machine Learning
Self Organizing Map
Microrna
Echinococcus Multilocularis
title_short MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data
title_full MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data
title_fullStr MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data
title_full_unstemmed MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data
title_sort MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data
dc.creator.none.fl_str_mv Kamenetzky, Laura
Stegmayer, Georgina
Maldonado, Lucas Luciano
Macchiaroli, Natalia
Yones, Cristian Ariel
Milone, Diego Humberto
author Kamenetzky, Laura
author_facet Kamenetzky, Laura
Stegmayer, Georgina
Maldonado, Lucas Luciano
Macchiaroli, Natalia
Yones, Cristian Ariel
Milone, Diego Humberto
author_role author
author2 Stegmayer, Georgina
Maldonado, Lucas Luciano
Macchiaroli, Natalia
Yones, Cristian Ariel
Milone, Diego Humberto
author2_role author
author
author
author
author
dc.subject.none.fl_str_mv Machine Learning
Self Organizing Map
Microrna
Echinococcus Multilocularis
topic Machine Learning
Self Organizing Map
Microrna
Echinococcus Multilocularis
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
https://purl.org/becyt/ford/3.3
https://purl.org/becyt/ford/3
dc.description.none.fl_txt_mv The cestode parasite Echinococcus multilocularis is the aetiological agent of alveolar echinococcosis, responsible for considerable human morbidity and mortality. This disease is a worldwide zoonosis of major public health concern and is considered a neglected disease by the World Health Organization. The complete genome of E. multilocularis has been recently sequenced and assembled in a collaborative effort between the Wellcome Trust Sanger Institute and our group, with the main aim of analyzing protein-coding genes. These analyses suggested that approximately 10% of E. multilocularis genome is composed of protein-coding regions. This shows there is still a vast proportion of the genome that needs to be explored, including non-coding RNAs such as small RNAs (sRNAs). Within this class of small regulatory RNAs, microRNAs (miRNAs) can be found, which have been identified in many different organisms ranging from viruses to higher eukaryotes. MiRNAs are a key regulation mechanism of gene expression at post-transcriptional level and play important roles in biological processes such as development, proliferation, cell differentiation and metabolism in animals and plants. In spite of this, identification of miRNAs directly from genome-wide data only is still a very challenging task. There are many miRNAs that remain unidentified due to the lack of either sequence information of particular phylums or appropriate algorithms to identify novel miRNAs. The motivation for this work is the discovery of new miRNAs in E. multilocularis based on non-target genomic data only, in order to obtain useful information from the currently available unexplored data. In this work, we present the discovery of new pre-miRNAs in the E. multilocularis genome through a novel approach based on machine learning. We have extracted the most commonly used structural features from the folded sequences of the parasite genome: triplets, minimum free energy and sequence length. These features have been used to train a novel deep architecture of self-organizing maps (SOMs). This model can be trained with a high class imbalance and without the artificial definition of a negative class. We discovered 886 pre-miRNA candidates within the E. multilocularis genome-wide data. After that, experimental validation by small RNA-seq analysis clearly showed 23 pre-miRNA candidates with a pattern compatible with miRNA biogenesis, indicating them as high confidence miRNAs. We discovered new pre-miRNA candidates in E. multilocularis using non-target genomic data only. Predictions were meaningful using only sequence data, with no need of RNA-seq data or target analysis for prediction. Furthermore, the methodology employed can be easily adapted and applied on any draft genomes, which are actually the most interesting ones since most non-model organisms have this kind of status and carry real biological and sanitary relevance.
Fil: Kamenetzky, Laura. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; Argentina
Fil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Maldonado, Lucas Luciano. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; Argentina
Fil: Macchiaroli, Natalia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Investigaciones en Microbiología y Parasitología Médica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en Microbiología y Parasitología Médica; Argentina
Fil: Yones, Cristian Ariel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
description The cestode parasite Echinococcus multilocularis is the aetiological agent of alveolar echinococcosis, responsible for considerable human morbidity and mortality. This disease is a worldwide zoonosis of major public health concern and is considered a neglected disease by the World Health Organization. The complete genome of E. multilocularis has been recently sequenced and assembled in a collaborative effort between the Wellcome Trust Sanger Institute and our group, with the main aim of analyzing protein-coding genes. These analyses suggested that approximately 10% of E. multilocularis genome is composed of protein-coding regions. This shows there is still a vast proportion of the genome that needs to be explored, including non-coding RNAs such as small RNAs (sRNAs). Within this class of small regulatory RNAs, microRNAs (miRNAs) can be found, which have been identified in many different organisms ranging from viruses to higher eukaryotes. MiRNAs are a key regulation mechanism of gene expression at post-transcriptional level and play important roles in biological processes such as development, proliferation, cell differentiation and metabolism in animals and plants. In spite of this, identification of miRNAs directly from genome-wide data only is still a very challenging task. There are many miRNAs that remain unidentified due to the lack of either sequence information of particular phylums or appropriate algorithms to identify novel miRNAs. The motivation for this work is the discovery of new miRNAs in E. multilocularis based on non-target genomic data only, in order to obtain useful information from the currently available unexplored data. In this work, we present the discovery of new pre-miRNAs in the E. multilocularis genome through a novel approach based on machine learning. We have extracted the most commonly used structural features from the folded sequences of the parasite genome: triplets, minimum free energy and sequence length. These features have been used to train a novel deep architecture of self-organizing maps (SOMs). This model can be trained with a high class imbalance and without the artificial definition of a negative class. We discovered 886 pre-miRNA candidates within the E. multilocularis genome-wide data. After that, experimental validation by small RNA-seq analysis clearly showed 23 pre-miRNA candidates with a pattern compatible with miRNA biogenesis, indicating them as high confidence miRNAs. We discovered new pre-miRNA candidates in E. multilocularis using non-target genomic data only. Predictions were meaningful using only sequence data, with no need of RNA-seq data or target analysis for prediction. Furthermore, the methodology employed can be easily adapted and applied on any draft genomes, which are actually the most interesting ones since most non-model organisms have this kind of status and carry real biological and sanitary relevance.
publishDate 2016
dc.date.none.fl_str_mv 2016-04
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/38958
Kamenetzky, Laura; Stegmayer, Georgina; Maldonado, Lucas Luciano; Macchiaroli, Natalia; Yones, Cristian Ariel; et al.; MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data; Academic Press Inc Elsevier Science; Genomics; 107; 6; 4-2016; 274-280
0888-7543
CONICET Digital
CONICET
url http://hdl.handle.net/11336/38958
identifier_str_mv Kamenetzky, Laura; Stegmayer, Georgina; Maldonado, Lucas Luciano; Macchiaroli, Natalia; Yones, Cristian Ariel; et al.; MicroRNA discovery in the human parasite Echinococcus multilocularis from genome-wide data; Academic Press Inc Elsevier Science; Genomics; 107; 6; 4-2016; 274-280
0888-7543
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0888754316300234
info:eu-repo/semantics/altIdentifier/doi/10.1016/j.ygeno.2016.04.002
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
application/pdf
application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv Academic Press Inc Elsevier Science
publisher.none.fl_str_mv Academic Press Inc Elsevier Science
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1842981362594742272
score 12.48226