Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery

Autores
Ponzoni, Ignacio; Sebastián Pérez, Víctor; Requena Triguero, Carlos; Roca, Carlos; Martínez, María Jimena; Cravero, Fiorella; Diaz, Monica Fatima; Páez, Juan A.; Gómez Arrayás, Ramón; Adrio, Javier; Campillo, Nuria E.
Año de publicación
2017
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Quantitative structure–activity relationship modeling using machine learning techniques constitutes a complex computational problem, where the identification of the most informative molecular descriptors for predicting a specific target property plays a critical role. Two main general approaches can be used for this modeling procedure: feature selection and feature learning. In this paper, a performance comparative study of two state-of-art methods related to these two approaches is carried out. In particular, regression and classification models for three different issues are inferred using both methods under different experimental scenarios: two drug-like properties, such as blood-brain-barrier and human intestinal absorption, and enantiomeric excess, as a measurement of purity used for chiral substances. Beyond the contrastive analysis of feature selection and feature learning methods as competitive approaches, the hybridization of these strategies is also evaluated based on previous results obtained in material sciences. From the experimental results, it can be concluded that there is not a clear winner between both approaches because the performance depends on the characteristics of the compound databases used for modeling. Nevertheless, in several cases, it was observed that the accuracy of the models can be improved by combining both approaches when the molecular descriptor sets provided by feature selection and feature learning contain complementary information.
Fil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina
Fil: Sebastián Pérez, Víctor. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; España
Fil: Requena Triguero, Carlos. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; España
Fil: Roca, Carlos. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; España
Fil: Martínez, María Jimena. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina
Fil: Cravero, Fiorella. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; Argentina
Fil: Diaz, Monica Fatima. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; Argentina
Fil: Páez, Juan A.. Consejo Superior de Investigaciones Científicas. Instituto de Química Médica; España
Fil: Gómez Arrayás, Ramón. Universidad Autónoma de Madrid; España
Fil: Adrio, Javier. Universidad Autónoma de Madrid; España. Institute for Advanced Research in Chemical Sciences; España
Fil: Campillo, Nuria E.. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; España
Materia
MACHINE LEARNING
QSAR
FEATURE SELECTION
FEATURE LEARNING
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/45651

id CONICETDig_a54fa611a9c75cb5727c7509d30010ac
oai_identifier_str oai:ri.conicet.gov.ar:11336/45651
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug DiscoveryPonzoni, IgnacioSebastián Pérez, VíctorRequena Triguero, CarlosRoca, CarlosMartínez, María JimenaCravero, FiorellaDiaz, Monica FatimaPáez, Juan A.Gómez Arrayás, RamónAdrio, JavierCampillo, Nuria E.MACHINE LEARNINGQSARFEATURE SELECTIONFEATURE LEARNINGhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Quantitative structure–activity relationship modeling using machine learning techniques constitutes a complex computational problem, where the identification of the most informative molecular descriptors for predicting a specific target property plays a critical role. Two main general approaches can be used for this modeling procedure: feature selection and feature learning. In this paper, a performance comparative study of two state-of-art methods related to these two approaches is carried out. In particular, regression and classification models for three different issues are inferred using both methods under different experimental scenarios: two drug-like properties, such as blood-brain-barrier and human intestinal absorption, and enantiomeric excess, as a measurement of purity used for chiral substances. Beyond the contrastive analysis of feature selection and feature learning methods as competitive approaches, the hybridization of these strategies is also evaluated based on previous results obtained in material sciences. From the experimental results, it can be concluded that there is not a clear winner between both approaches because the performance depends on the characteristics of the compound databases used for modeling. Nevertheless, in several cases, it was observed that the accuracy of the models can be improved by combining both approaches when the molecular descriptor sets provided by feature selection and feature learning contain complementary information.Fil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Sebastián Pérez, Víctor. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; EspañaFil: Requena Triguero, Carlos. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; EspañaFil: Roca, Carlos. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; EspañaFil: Martínez, María Jimena. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Cravero, Fiorella. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; ArgentinaFil: Diaz, Monica Fatima. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; ArgentinaFil: Páez, Juan A.. Consejo Superior de Investigaciones Científicas. Instituto de Química Médica; EspañaFil: Gómez Arrayás, Ramón. Universidad Autónoma de Madrid; EspañaFil: Adrio, Javier. Universidad Autónoma de Madrid; España. Institute for Advanced Research in Chemical Sciences; EspañaFil: Campillo, Nuria E.. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; EspañaNature Publishing Group2017-05-25info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/45651Ponzoni, Ignacio; Sebastián Pérez, Víctor; Requena Triguero, Carlos; Roca, Carlos; Martínez, María Jimena; et al.; Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery; Nature Publishing Group; Scientific Reports; 7; 1; 25-5-2017; 1-192045-2322CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.nature.com/articles/s41598-017-02114-3info:eu-repo/semantics/altIdentifier/doi/10.1038/s41598-017-02114-3info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:09:00Zoai:ri.conicet.gov.ar:11336/45651instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:09:00.307CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery
title Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery
spellingShingle Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery
Ponzoni, Ignacio
MACHINE LEARNING
QSAR
FEATURE SELECTION
FEATURE LEARNING
title_short Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery
title_full Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery
title_fullStr Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery
title_full_unstemmed Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery
title_sort Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery
dc.creator.none.fl_str_mv Ponzoni, Ignacio
Sebastián Pérez, Víctor
Requena Triguero, Carlos
Roca, Carlos
Martínez, María Jimena
Cravero, Fiorella
Diaz, Monica Fatima
Páez, Juan A.
Gómez Arrayás, Ramón
Adrio, Javier
Campillo, Nuria E.
author Ponzoni, Ignacio
author_facet Ponzoni, Ignacio
Sebastián Pérez, Víctor
Requena Triguero, Carlos
Roca, Carlos
Martínez, María Jimena
Cravero, Fiorella
Diaz, Monica Fatima
Páez, Juan A.
Gómez Arrayás, Ramón
Adrio, Javier
Campillo, Nuria E.
author_role author
author2 Sebastián Pérez, Víctor
Requena Triguero, Carlos
Roca, Carlos
Martínez, María Jimena
Cravero, Fiorella
Diaz, Monica Fatima
Páez, Juan A.
Gómez Arrayás, Ramón
Adrio, Javier
Campillo, Nuria E.
author2_role author
author
author
author
author
author
author
author
author
author
dc.subject.none.fl_str_mv MACHINE LEARNING
QSAR
FEATURE SELECTION
FEATURE LEARNING
topic MACHINE LEARNING
QSAR
FEATURE SELECTION
FEATURE LEARNING
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Quantitative structure–activity relationship modeling using machine learning techniques constitutes a complex computational problem, where the identification of the most informative molecular descriptors for predicting a specific target property plays a critical role. Two main general approaches can be used for this modeling procedure: feature selection and feature learning. In this paper, a performance comparative study of two state-of-art methods related to these two approaches is carried out. In particular, regression and classification models for three different issues are inferred using both methods under different experimental scenarios: two drug-like properties, such as blood-brain-barrier and human intestinal absorption, and enantiomeric excess, as a measurement of purity used for chiral substances. Beyond the contrastive analysis of feature selection and feature learning methods as competitive approaches, the hybridization of these strategies is also evaluated based on previous results obtained in material sciences. From the experimental results, it can be concluded that there is not a clear winner between both approaches because the performance depends on the characteristics of the compound databases used for modeling. Nevertheless, in several cases, it was observed that the accuracy of the models can be improved by combining both approaches when the molecular descriptor sets provided by feature selection and feature learning contain complementary information.
Fil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina
Fil: Sebastián Pérez, Víctor. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; España
Fil: Requena Triguero, Carlos. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; España
Fil: Roca, Carlos. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; España
Fil: Martínez, María Jimena. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina
Fil: Cravero, Fiorella. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; Argentina
Fil: Diaz, Monica Fatima. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; Argentina
Fil: Páez, Juan A.. Consejo Superior de Investigaciones Científicas. Instituto de Química Médica; España
Fil: Gómez Arrayás, Ramón. Universidad Autónoma de Madrid; España
Fil: Adrio, Javier. Universidad Autónoma de Madrid; España. Institute for Advanced Research in Chemical Sciences; España
Fil: Campillo, Nuria E.. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; España
description Quantitative structure–activity relationship modeling using machine learning techniques constitutes a complex computational problem, where the identification of the most informative molecular descriptors for predicting a specific target property plays a critical role. Two main general approaches can be used for this modeling procedure: feature selection and feature learning. In this paper, a performance comparative study of two state-of-art methods related to these two approaches is carried out. In particular, regression and classification models for three different issues are inferred using both methods under different experimental scenarios: two drug-like properties, such as blood-brain-barrier and human intestinal absorption, and enantiomeric excess, as a measurement of purity used for chiral substances. Beyond the contrastive analysis of feature selection and feature learning methods as competitive approaches, the hybridization of these strategies is also evaluated based on previous results obtained in material sciences. From the experimental results, it can be concluded that there is not a clear winner between both approaches because the performance depends on the characteristics of the compound databases used for modeling. Nevertheless, in several cases, it was observed that the accuracy of the models can be improved by combining both approaches when the molecular descriptor sets provided by feature selection and feature learning contain complementary information.
publishDate 2017
dc.date.none.fl_str_mv 2017-05-25
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/45651
Ponzoni, Ignacio; Sebastián Pérez, Víctor; Requena Triguero, Carlos; Roca, Carlos; Martínez, María Jimena; et al.; Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery; Nature Publishing Group; Scientific Reports; 7; 1; 25-5-2017; 1-19
2045-2322
CONICET Digital
CONICET
url http://hdl.handle.net/11336/45651
identifier_str_mv Ponzoni, Ignacio; Sebastián Pérez, Víctor; Requena Triguero, Carlos; Roca, Carlos; Martínez, María Jimena; et al.; Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery; Nature Publishing Group; Scientific Reports; 7; 1; 25-5-2017; 1-19
2045-2322
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://www.nature.com/articles/s41598-017-02114-3
info:eu-repo/semantics/altIdentifier/doi/10.1038/s41598-017-02114-3
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv Nature Publishing Group
publisher.none.fl_str_mv Nature Publishing Group
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844613963171495936
score 13.070432