Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery
- Autores
- Ponzoni, Ignacio; Sebastián Pérez, Víctor; Requena Triguero, Carlos; Roca, Carlos; Martínez, María Jimena; Cravero, Fiorella; Diaz, Monica Fatima; Páez, Juan A.; Gómez Arrayás, Ramón; Adrio, Javier; Campillo, Nuria E.
- Año de publicación
- 2017
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Quantitative structure–activity relationship modeling using machine learning techniques constitutes a complex computational problem, where the identification of the most informative molecular descriptors for predicting a specific target property plays a critical role. Two main general approaches can be used for this modeling procedure: feature selection and feature learning. In this paper, a performance comparative study of two state-of-art methods related to these two approaches is carried out. In particular, regression and classification models for three different issues are inferred using both methods under different experimental scenarios: two drug-like properties, such as blood-brain-barrier and human intestinal absorption, and enantiomeric excess, as a measurement of purity used for chiral substances. Beyond the contrastive analysis of feature selection and feature learning methods as competitive approaches, the hybridization of these strategies is also evaluated based on previous results obtained in material sciences. From the experimental results, it can be concluded that there is not a clear winner between both approaches because the performance depends on the characteristics of the compound databases used for modeling. Nevertheless, in several cases, it was observed that the accuracy of the models can be improved by combining both approaches when the molecular descriptor sets provided by feature selection and feature learning contain complementary information.
Fil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina
Fil: Sebastián Pérez, Víctor. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; España
Fil: Requena Triguero, Carlos. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; España
Fil: Roca, Carlos. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; España
Fil: Martínez, María Jimena. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina
Fil: Cravero, Fiorella. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; Argentina
Fil: Diaz, Monica Fatima. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; Argentina
Fil: Páez, Juan A.. Consejo Superior de Investigaciones Científicas. Instituto de Química Médica; España
Fil: Gómez Arrayás, Ramón. Universidad Autónoma de Madrid; España
Fil: Adrio, Javier. Universidad Autónoma de Madrid; España. Institute for Advanced Research in Chemical Sciences; España
Fil: Campillo, Nuria E.. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; España - Materia
-
MACHINE LEARNING
QSAR
FEATURE SELECTION
FEATURE LEARNING - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/45651
Ver los metadatos del registro completo
id |
CONICETDig_a54fa611a9c75cb5727c7509d30010ac |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/45651 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug DiscoveryPonzoni, IgnacioSebastián Pérez, VíctorRequena Triguero, CarlosRoca, CarlosMartínez, María JimenaCravero, FiorellaDiaz, Monica FatimaPáez, Juan A.Gómez Arrayás, RamónAdrio, JavierCampillo, Nuria E.MACHINE LEARNINGQSARFEATURE SELECTIONFEATURE LEARNINGhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Quantitative structure–activity relationship modeling using machine learning techniques constitutes a complex computational problem, where the identification of the most informative molecular descriptors for predicting a specific target property plays a critical role. Two main general approaches can be used for this modeling procedure: feature selection and feature learning. In this paper, a performance comparative study of two state-of-art methods related to these two approaches is carried out. In particular, regression and classification models for three different issues are inferred using both methods under different experimental scenarios: two drug-like properties, such as blood-brain-barrier and human intestinal absorption, and enantiomeric excess, as a measurement of purity used for chiral substances. Beyond the contrastive analysis of feature selection and feature learning methods as competitive approaches, the hybridization of these strategies is also evaluated based on previous results obtained in material sciences. From the experimental results, it can be concluded that there is not a clear winner between both approaches because the performance depends on the characteristics of the compound databases used for modeling. Nevertheless, in several cases, it was observed that the accuracy of the models can be improved by combining both approaches when the molecular descriptor sets provided by feature selection and feature learning contain complementary information.Fil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Sebastián Pérez, Víctor. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; EspañaFil: Requena Triguero, Carlos. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; EspañaFil: Roca, Carlos. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; EspañaFil: Martínez, María Jimena. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Cravero, Fiorella. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; ArgentinaFil: Diaz, Monica Fatima. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; ArgentinaFil: Páez, Juan A.. Consejo Superior de Investigaciones Científicas. Instituto de Química Médica; EspañaFil: Gómez Arrayás, Ramón. Universidad Autónoma de Madrid; EspañaFil: Adrio, Javier. Universidad Autónoma de Madrid; España. Institute for Advanced Research in Chemical Sciences; EspañaFil: Campillo, Nuria E.. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; EspañaNature Publishing Group2017-05-25info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/45651Ponzoni, Ignacio; Sebastián Pérez, Víctor; Requena Triguero, Carlos; Roca, Carlos; Martínez, María Jimena; et al.; Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery; Nature Publishing Group; Scientific Reports; 7; 1; 25-5-2017; 1-192045-2322CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.nature.com/articles/s41598-017-02114-3info:eu-repo/semantics/altIdentifier/doi/10.1038/s41598-017-02114-3info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:09:00Zoai:ri.conicet.gov.ar:11336/45651instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:09:00.307CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery |
title |
Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery |
spellingShingle |
Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery Ponzoni, Ignacio MACHINE LEARNING QSAR FEATURE SELECTION FEATURE LEARNING |
title_short |
Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery |
title_full |
Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery |
title_fullStr |
Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery |
title_full_unstemmed |
Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery |
title_sort |
Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery |
dc.creator.none.fl_str_mv |
Ponzoni, Ignacio Sebastián Pérez, Víctor Requena Triguero, Carlos Roca, Carlos Martínez, María Jimena Cravero, Fiorella Diaz, Monica Fatima Páez, Juan A. Gómez Arrayás, Ramón Adrio, Javier Campillo, Nuria E. |
author |
Ponzoni, Ignacio |
author_facet |
Ponzoni, Ignacio Sebastián Pérez, Víctor Requena Triguero, Carlos Roca, Carlos Martínez, María Jimena Cravero, Fiorella Diaz, Monica Fatima Páez, Juan A. Gómez Arrayás, Ramón Adrio, Javier Campillo, Nuria E. |
author_role |
author |
author2 |
Sebastián Pérez, Víctor Requena Triguero, Carlos Roca, Carlos Martínez, María Jimena Cravero, Fiorella Diaz, Monica Fatima Páez, Juan A. Gómez Arrayás, Ramón Adrio, Javier Campillo, Nuria E. |
author2_role |
author author author author author author author author author author |
dc.subject.none.fl_str_mv |
MACHINE LEARNING QSAR FEATURE SELECTION FEATURE LEARNING |
topic |
MACHINE LEARNING QSAR FEATURE SELECTION FEATURE LEARNING |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
Quantitative structure–activity relationship modeling using machine learning techniques constitutes a complex computational problem, where the identification of the most informative molecular descriptors for predicting a specific target property plays a critical role. Two main general approaches can be used for this modeling procedure: feature selection and feature learning. In this paper, a performance comparative study of two state-of-art methods related to these two approaches is carried out. In particular, regression and classification models for three different issues are inferred using both methods under different experimental scenarios: two drug-like properties, such as blood-brain-barrier and human intestinal absorption, and enantiomeric excess, as a measurement of purity used for chiral substances. Beyond the contrastive analysis of feature selection and feature learning methods as competitive approaches, the hybridization of these strategies is also evaluated based on previous results obtained in material sciences. From the experimental results, it can be concluded that there is not a clear winner between both approaches because the performance depends on the characteristics of the compound databases used for modeling. Nevertheless, in several cases, it was observed that the accuracy of the models can be improved by combining both approaches when the molecular descriptor sets provided by feature selection and feature learning contain complementary information. Fil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina Fil: Sebastián Pérez, Víctor. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; España Fil: Requena Triguero, Carlos. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; España Fil: Roca, Carlos. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; España Fil: Martínez, María Jimena. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina Fil: Cravero, Fiorella. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; Argentina Fil: Diaz, Monica Fatima. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; Argentina Fil: Páez, Juan A.. Consejo Superior de Investigaciones Científicas. Instituto de Química Médica; España Fil: Gómez Arrayás, Ramón. Universidad Autónoma de Madrid; España Fil: Adrio, Javier. Universidad Autónoma de Madrid; España. Institute for Advanced Research in Chemical Sciences; España Fil: Campillo, Nuria E.. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; España |
description |
Quantitative structure–activity relationship modeling using machine learning techniques constitutes a complex computational problem, where the identification of the most informative molecular descriptors for predicting a specific target property plays a critical role. Two main general approaches can be used for this modeling procedure: feature selection and feature learning. In this paper, a performance comparative study of two state-of-art methods related to these two approaches is carried out. In particular, regression and classification models for three different issues are inferred using both methods under different experimental scenarios: two drug-like properties, such as blood-brain-barrier and human intestinal absorption, and enantiomeric excess, as a measurement of purity used for chiral substances. Beyond the contrastive analysis of feature selection and feature learning methods as competitive approaches, the hybridization of these strategies is also evaluated based on previous results obtained in material sciences. From the experimental results, it can be concluded that there is not a clear winner between both approaches because the performance depends on the characteristics of the compound databases used for modeling. Nevertheless, in several cases, it was observed that the accuracy of the models can be improved by combining both approaches when the molecular descriptor sets provided by feature selection and feature learning contain complementary information. |
publishDate |
2017 |
dc.date.none.fl_str_mv |
2017-05-25 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/45651 Ponzoni, Ignacio; Sebastián Pérez, Víctor; Requena Triguero, Carlos; Roca, Carlos; Martínez, María Jimena; et al.; Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery; Nature Publishing Group; Scientific Reports; 7; 1; 25-5-2017; 1-19 2045-2322 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/45651 |
identifier_str_mv |
Ponzoni, Ignacio; Sebastián Pérez, Víctor; Requena Triguero, Carlos; Roca, Carlos; Martínez, María Jimena; et al.; Hybridizing Feature Selection and Feature Learning Approaches in QSAR Modeling for Drug Discovery; Nature Publishing Group; Scientific Reports; 7; 1; 25-5-2017; 1-19 2045-2322 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://www.nature.com/articles/s41598-017-02114-3 info:eu-repo/semantics/altIdentifier/doi/10.1038/s41598-017-02114-3 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Nature Publishing Group |
publisher.none.fl_str_mv |
Nature Publishing Group |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844613963171495936 |
score |
13.070432 |