An introduction to deep learning on biological sequence data: Examples and solutions

Autores
Jurtz, Vanessa Isabell; Johansen, Alexander Rosenberg; Nielsen, Morten; Almagro Armenteros, Jose Juan; Nielsen, Henrik; Sønderby, Casper Kaae; Winther, Ole; Sønderby, Søren Kaae
Año de publicación
2017
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Motivation: Deep neural network architectures such as convolutional and long short-term memory networks have become increasingly popular as machine learning tools during the recent years. The availability of greater computational resources, more data, new algorithms for training deep models and easy to use libraries for implementation and training of neural networks are the drivers of this development. The use of deep learning has been especially successful in image recognition; and the development of tools, applications and code examples are in most cases centered within this field rather than within biology. Results: Here, we aim to further the development of deep learning methods within biology by providing application examples and ready to apply and adapt code templates. Given such examples, we illustrate how architectures consisting of convolutional and long short-term memory neural networks can relatively easily be designed and trained to state-of-the-art performance on three biological sequence problems: prediction of subcellular localization, protein secondary structure and the binding of peptides to MHC Class II molecules. Availability and implementation: All implementations and datasets are available online to the scientific community at https://github.com/vanessajurtz/lasagne4bio. Supplementary information: Supplementary data are available at Bioinformatics online.
Fil: Jurtz, Vanessa Isabell. Technical University of Denmark; Dinamarca
Fil: Johansen, Alexander Rosenberg. Technical University of Denmark; Dinamarca
Fil: Nielsen, Morten. Technical University of Denmark; Dinamarca. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús). Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús); Argentina
Fil: Almagro Armenteros, Jose Juan. Technical University of Denmark; Dinamarca
Fil: Nielsen, Henrik. Technical University of Denmark; Dinamarca
Fil: Sønderby, Casper Kaae. Universidad de Copenhagen; Dinamarca
Fil: Winther, Ole. Universidad de Copenhagen; Dinamarca
Fil: Sønderby, Søren Kaae. Universidad de Copenhagen; Dinamarca
Materia
Machine Learning
Biology
Sequence
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/66355

id CONICETDig_cac38300eac75f56bb2cbaed14414efe
oai_identifier_str oai:ri.conicet.gov.ar:11336/66355
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling An introduction to deep learning on biological sequence data: Examples and solutionsJurtz, Vanessa IsabellJohansen, Alexander RosenbergNielsen, MortenAlmagro Armenteros, Jose JuanNielsen, HenrikSønderby, Casper KaaeWinther, OleSønderby, Søren KaaeMachine LearningBiologySequencehttps://purl.org/becyt/ford/3.3https://purl.org/becyt/ford/3Motivation: Deep neural network architectures such as convolutional and long short-term memory networks have become increasingly popular as machine learning tools during the recent years. The availability of greater computational resources, more data, new algorithms for training deep models and easy to use libraries for implementation and training of neural networks are the drivers of this development. The use of deep learning has been especially successful in image recognition; and the development of tools, applications and code examples are in most cases centered within this field rather than within biology. Results: Here, we aim to further the development of deep learning methods within biology by providing application examples and ready to apply and adapt code templates. Given such examples, we illustrate how architectures consisting of convolutional and long short-term memory neural networks can relatively easily be designed and trained to state-of-the-art performance on three biological sequence problems: prediction of subcellular localization, protein secondary structure and the binding of peptides to MHC Class II molecules. Availability and implementation: All implementations and datasets are available online to the scientific community at https://github.com/vanessajurtz/lasagne4bio. Supplementary information: Supplementary data are available at Bioinformatics online.Fil: Jurtz, Vanessa Isabell. Technical University of Denmark; DinamarcaFil: Johansen, Alexander Rosenberg. Technical University of Denmark; DinamarcaFil: Nielsen, Morten. Technical University of Denmark; Dinamarca. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús). Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús); ArgentinaFil: Almagro Armenteros, Jose Juan. Technical University of Denmark; DinamarcaFil: Nielsen, Henrik. Technical University of Denmark; DinamarcaFil: Sønderby, Casper Kaae. Universidad de Copenhagen; DinamarcaFil: Winther, Ole. Universidad de Copenhagen; DinamarcaFil: Sønderby, Søren Kaae. Universidad de Copenhagen; DinamarcaOxford University Press2017-11info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/66355Jurtz, Vanessa Isabell; Johansen, Alexander Rosenberg; Nielsen, Morten; Almagro Armenteros, Jose Juan; Nielsen, Henrik; et al.; An introduction to deep learning on biological sequence data: Examples and solutions; Oxford University Press; Bioinformatics (Oxford, England); 33; 22; 11-2017; 3685-36901367-48031460-2059CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1093/bioinformatics/btx531info:eu-repo/semantics/altIdentifier/url/https://academic.oup.com/bioinformatics/article-abstract/33/22/3685/4092933info:eu-repo/semantics/altIdentifier/url/https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870575/info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-10-15T15:34:44Zoai:ri.conicet.gov.ar:11336/66355instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-10-15 15:34:44.372CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv An introduction to deep learning on biological sequence data: Examples and solutions
title An introduction to deep learning on biological sequence data: Examples and solutions
spellingShingle An introduction to deep learning on biological sequence data: Examples and solutions
Jurtz, Vanessa Isabell
Machine Learning
Biology
Sequence
title_short An introduction to deep learning on biological sequence data: Examples and solutions
title_full An introduction to deep learning on biological sequence data: Examples and solutions
title_fullStr An introduction to deep learning on biological sequence data: Examples and solutions
title_full_unstemmed An introduction to deep learning on biological sequence data: Examples and solutions
title_sort An introduction to deep learning on biological sequence data: Examples and solutions
dc.creator.none.fl_str_mv Jurtz, Vanessa Isabell
Johansen, Alexander Rosenberg
Nielsen, Morten
Almagro Armenteros, Jose Juan
Nielsen, Henrik
Sønderby, Casper Kaae
Winther, Ole
Sønderby, Søren Kaae
author Jurtz, Vanessa Isabell
author_facet Jurtz, Vanessa Isabell
Johansen, Alexander Rosenberg
Nielsen, Morten
Almagro Armenteros, Jose Juan
Nielsen, Henrik
Sønderby, Casper Kaae
Winther, Ole
Sønderby, Søren Kaae
author_role author
author2 Johansen, Alexander Rosenberg
Nielsen, Morten
Almagro Armenteros, Jose Juan
Nielsen, Henrik
Sønderby, Casper Kaae
Winther, Ole
Sønderby, Søren Kaae
author2_role author
author
author
author
author
author
author
dc.subject.none.fl_str_mv Machine Learning
Biology
Sequence
topic Machine Learning
Biology
Sequence
purl_subject.fl_str_mv https://purl.org/becyt/ford/3.3
https://purl.org/becyt/ford/3
dc.description.none.fl_txt_mv Motivation: Deep neural network architectures such as convolutional and long short-term memory networks have become increasingly popular as machine learning tools during the recent years. The availability of greater computational resources, more data, new algorithms for training deep models and easy to use libraries for implementation and training of neural networks are the drivers of this development. The use of deep learning has been especially successful in image recognition; and the development of tools, applications and code examples are in most cases centered within this field rather than within biology. Results: Here, we aim to further the development of deep learning methods within biology by providing application examples and ready to apply and adapt code templates. Given such examples, we illustrate how architectures consisting of convolutional and long short-term memory neural networks can relatively easily be designed and trained to state-of-the-art performance on three biological sequence problems: prediction of subcellular localization, protein secondary structure and the binding of peptides to MHC Class II molecules. Availability and implementation: All implementations and datasets are available online to the scientific community at https://github.com/vanessajurtz/lasagne4bio. Supplementary information: Supplementary data are available at Bioinformatics online.
Fil: Jurtz, Vanessa Isabell. Technical University of Denmark; Dinamarca
Fil: Johansen, Alexander Rosenberg. Technical University of Denmark; Dinamarca
Fil: Nielsen, Morten. Technical University of Denmark; Dinamarca. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús). Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús); Argentina
Fil: Almagro Armenteros, Jose Juan. Technical University of Denmark; Dinamarca
Fil: Nielsen, Henrik. Technical University of Denmark; Dinamarca
Fil: Sønderby, Casper Kaae. Universidad de Copenhagen; Dinamarca
Fil: Winther, Ole. Universidad de Copenhagen; Dinamarca
Fil: Sønderby, Søren Kaae. Universidad de Copenhagen; Dinamarca
description Motivation: Deep neural network architectures such as convolutional and long short-term memory networks have become increasingly popular as machine learning tools during the recent years. The availability of greater computational resources, more data, new algorithms for training deep models and easy to use libraries for implementation and training of neural networks are the drivers of this development. The use of deep learning has been especially successful in image recognition; and the development of tools, applications and code examples are in most cases centered within this field rather than within biology. Results: Here, we aim to further the development of deep learning methods within biology by providing application examples and ready to apply and adapt code templates. Given such examples, we illustrate how architectures consisting of convolutional and long short-term memory neural networks can relatively easily be designed and trained to state-of-the-art performance on three biological sequence problems: prediction of subcellular localization, protein secondary structure and the binding of peptides to MHC Class II molecules. Availability and implementation: All implementations and datasets are available online to the scientific community at https://github.com/vanessajurtz/lasagne4bio. Supplementary information: Supplementary data are available at Bioinformatics online.
publishDate 2017
dc.date.none.fl_str_mv 2017-11
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/66355
Jurtz, Vanessa Isabell; Johansen, Alexander Rosenberg; Nielsen, Morten; Almagro Armenteros, Jose Juan; Nielsen, Henrik; et al.; An introduction to deep learning on biological sequence data: Examples and solutions; Oxford University Press; Bioinformatics (Oxford, England); 33; 22; 11-2017; 3685-3690
1367-4803
1460-2059
CONICET Digital
CONICET
url http://hdl.handle.net/11336/66355
identifier_str_mv Jurtz, Vanessa Isabell; Johansen, Alexander Rosenberg; Nielsen, Morten; Almagro Armenteros, Jose Juan; Nielsen, Henrik; et al.; An introduction to deep learning on biological sequence data: Examples and solutions; Oxford University Press; Bioinformatics (Oxford, England); 33; 22; 11-2017; 3685-3690
1367-4803
1460-2059
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/10.1093/bioinformatics/btx531
info:eu-repo/semantics/altIdentifier/url/https://academic.oup.com/bioinformatics/article-abstract/33/22/3685/4092933
info:eu-repo/semantics/altIdentifier/url/https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870575/
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv Oxford University Press
publisher.none.fl_str_mv Oxford University Press
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1846083475251134464
score 13.22299