Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models

Autores
Andrada, Matias Fernando; Vega Hissi, Esteban Gabriel; Estrada, Mario Rinaldo; Garro Martinez, Juan Ceferino
Año de publicación
2017
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
This study performed an analysis of the influence of the training and test set rational selection on the quality and predictively of the quantitative structure–activity relationship (QSAR) model. The study was carried out on three different datasets of Influenza Neuraminidase (H1N1) inhibitors. The three datasets were divided into training and test sets using three rational selection methods: based on k-means, Kennard–Stone algorithm and Activity and the results were compared with Random selection. Then, a total of 31,490 mathematical models were developed and those models that presented a determination coefficient higher than: r2 train > 0.8, r2 loo > 0.7, r2 test > 0.5 and minimum standard deviation (SD) and minimum root-mean square error (RMS) were selected. The selected models were validated using the internal leave-one-out method and the predictive capacity was evaluated by the external test set. The results indicate that random selection could lead to erroneous results. In return, a rational selection allows for obtaining more reliable conclusions. The QSAR models with major predictive power were found using the k-means algorithm and selection by activity.
Fil: Andrada, Matias Fernando. Universidad Nacional de San Luis. Facultad de Química, Bioquímica y Farmacia; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Vega Hissi, Esteban Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Luis. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis; Argentina
Fil: Estrada, Mario Rinaldo. Universidad Nacional de San Luis. Facultad de Química, Bioquímica y Farmacia; Argentina
Fil: Garro Martinez, Juan Ceferino. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Luis. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis; Argentina
Materia
Based on Activity
K-Means
Kennard&Ndash;Stone
Qsar
Random Selection
Rational Partition of Dataset
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/67076

id CONICETDig_c82740da0eafc64af71748bf5703b987
oai_identifier_str oai:ri.conicet.gov.ar:11336/67076
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR modelsAndrada, Matias FernandoVega Hissi, Esteban GabrielEstrada, Mario RinaldoGarro Martinez, Juan CeferinoBased on ActivityK-MeansKennard&Ndash;StoneQsarRandom SelectionRational Partition of Datasethttps://purl.org/becyt/ford/1.4https://purl.org/becyt/ford/1This study performed an analysis of the influence of the training and test set rational selection on the quality and predictively of the quantitative structure–activity relationship (QSAR) model. The study was carried out on three different datasets of Influenza Neuraminidase (H1N1) inhibitors. The three datasets were divided into training and test sets using three rational selection methods: based on k-means, Kennard–Stone algorithm and Activity and the results were compared with Random selection. Then, a total of 31,490 mathematical models were developed and those models that presented a determination coefficient higher than: r2 train > 0.8, r2 loo > 0.7, r2 test > 0.5 and minimum standard deviation (SD) and minimum root-mean square error (RMS) were selected. The selected models were validated using the internal leave-one-out method and the predictive capacity was evaluated by the external test set. The results indicate that random selection could lead to erroneous results. In return, a rational selection allows for obtaining more reliable conclusions. The QSAR models with major predictive power were found using the k-means algorithm and selection by activity.Fil: Andrada, Matias Fernando. Universidad Nacional de San Luis. Facultad de Química, Bioquímica y Farmacia; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Vega Hissi, Esteban Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Luis. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis; ArgentinaFil: Estrada, Mario Rinaldo. Universidad Nacional de San Luis. Facultad de Química, Bioquímica y Farmacia; ArgentinaFil: Garro Martinez, Juan Ceferino. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Luis. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis; ArgentinaTaylor & Francis Ltd2017-12info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/67076Andrada, Matias Fernando; Vega Hissi, Esteban Gabriel; Estrada, Mario Rinaldo; Garro Martinez, Juan Ceferino; Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models; Taylor & Francis Ltd; Sar And Qsar In Environmental Research; 28; 12; 12-2017; 1011-10231062-936X1029-046XCONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1080/1062936X.2017.1397056info:eu-repo/semantics/altIdentifier/url/https://www.tandfonline.com/doi/full/10.1080/1062936X.2017.1397056info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-11-12T09:40:34Zoai:ri.conicet.gov.ar:11336/67076instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-11-12 09:40:34.918CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models
title Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models
spellingShingle Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models
Andrada, Matias Fernando
Based on Activity
K-Means
Kennard&Ndash;Stone
Qsar
Random Selection
Rational Partition of Dataset
title_short Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models
title_full Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models
title_fullStr Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models
title_full_unstemmed Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models
title_sort Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models
dc.creator.none.fl_str_mv Andrada, Matias Fernando
Vega Hissi, Esteban Gabriel
Estrada, Mario Rinaldo
Garro Martinez, Juan Ceferino
author Andrada, Matias Fernando
author_facet Andrada, Matias Fernando
Vega Hissi, Esteban Gabriel
Estrada, Mario Rinaldo
Garro Martinez, Juan Ceferino
author_role author
author2 Vega Hissi, Esteban Gabriel
Estrada, Mario Rinaldo
Garro Martinez, Juan Ceferino
author2_role author
author
author
dc.subject.none.fl_str_mv Based on Activity
K-Means
Kennard&Ndash;Stone
Qsar
Random Selection
Rational Partition of Dataset
topic Based on Activity
K-Means
Kennard&Ndash;Stone
Qsar
Random Selection
Rational Partition of Dataset
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.4
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv This study performed an analysis of the influence of the training and test set rational selection on the quality and predictively of the quantitative structure–activity relationship (QSAR) model. The study was carried out on three different datasets of Influenza Neuraminidase (H1N1) inhibitors. The three datasets were divided into training and test sets using three rational selection methods: based on k-means, Kennard–Stone algorithm and Activity and the results were compared with Random selection. Then, a total of 31,490 mathematical models were developed and those models that presented a determination coefficient higher than: r2 train > 0.8, r2 loo > 0.7, r2 test > 0.5 and minimum standard deviation (SD) and minimum root-mean square error (RMS) were selected. The selected models were validated using the internal leave-one-out method and the predictive capacity was evaluated by the external test set. The results indicate that random selection could lead to erroneous results. In return, a rational selection allows for obtaining more reliable conclusions. The QSAR models with major predictive power were found using the k-means algorithm and selection by activity.
Fil: Andrada, Matias Fernando. Universidad Nacional de San Luis. Facultad de Química, Bioquímica y Farmacia; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Vega Hissi, Esteban Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Luis. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis; Argentina
Fil: Estrada, Mario Rinaldo. Universidad Nacional de San Luis. Facultad de Química, Bioquímica y Farmacia; Argentina
Fil: Garro Martinez, Juan Ceferino. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Luis. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis; Argentina
description This study performed an analysis of the influence of the training and test set rational selection on the quality and predictively of the quantitative structure–activity relationship (QSAR) model. The study was carried out on three different datasets of Influenza Neuraminidase (H1N1) inhibitors. The three datasets were divided into training and test sets using three rational selection methods: based on k-means, Kennard–Stone algorithm and Activity and the results were compared with Random selection. Then, a total of 31,490 mathematical models were developed and those models that presented a determination coefficient higher than: r2 train > 0.8, r2 loo > 0.7, r2 test > 0.5 and minimum standard deviation (SD) and minimum root-mean square error (RMS) were selected. The selected models were validated using the internal leave-one-out method and the predictive capacity was evaluated by the external test set. The results indicate that random selection could lead to erroneous results. In return, a rational selection allows for obtaining more reliable conclusions. The QSAR models with major predictive power were found using the k-means algorithm and selection by activity.
publishDate 2017
dc.date.none.fl_str_mv 2017-12
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/67076
Andrada, Matias Fernando; Vega Hissi, Esteban Gabriel; Estrada, Mario Rinaldo; Garro Martinez, Juan Ceferino; Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models; Taylor & Francis Ltd; Sar And Qsar In Environmental Research; 28; 12; 12-2017; 1011-1023
1062-936X
1029-046X
CONICET Digital
CONICET
url http://hdl.handle.net/11336/67076
identifier_str_mv Andrada, Matias Fernando; Vega Hissi, Esteban Gabriel; Estrada, Mario Rinaldo; Garro Martinez, Juan Ceferino; Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models; Taylor & Francis Ltd; Sar And Qsar In Environmental Research; 28; 12; 12-2017; 1011-1023
1062-936X
1029-046X
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/10.1080/1062936X.2017.1397056
info:eu-repo/semantics/altIdentifier/url/https://www.tandfonline.com/doi/full/10.1080/1062936X.2017.1397056
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Taylor & Francis Ltd
publisher.none.fl_str_mv Taylor & Francis Ltd
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1848597517916176384
score 13.24909