Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models

Autores: Andrada, Matias Fernando; Vega Hissi, Esteban Gabriel; Estrada, Mario Rinaldo; Garro Martinez, Juan Ceferino
Año de publicación: 2017
Idioma: inglés
Tipo de recurso: artículo
Estado: versión publicada
Descripción: This study performed an analysis of the influence of the training and test set rational selection on the quality and predictively of the quantitative structure–activity relationship (QSAR) model. The study was carried out on three different datasets of Influenza Neuraminidase (H1N1) inhibitors. The three datasets were divided into training and test sets using three rational selection methods: based on k-means, Kennard–Stone algorithm and Activity and the results were compared with Random selection. Then, a total of 31,490 mathematical models were developed and those models that presented a determination coefficient higher than: r2 train > 0.8, r2 loo > 0.7, r2 test > 0.5 and minimum standard deviation (SD) and minimum root-mean square error (RMS) were selected. The selected models were validated using the internal leave-one-out method and the predictive capacity was evaluated by the external test set. The results indicate that random selection could lead to erroneous results. In return, a rational selection allows for obtaining more reliable conclusions. The QSAR models with major predictive power were found using the k-means algorithm and selection by activity.
Fil: Andrada, Matias Fernando. Universidad Nacional de San Luis. Facultad de Química, Bioquímica y Farmacia; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Vega Hissi, Esteban Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Luis. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis; Argentina
Fil: Estrada, Mario Rinaldo. Universidad Nacional de San Luis. Facultad de Química, Bioquímica y Farmacia; Argentina
Fil: Garro Martinez, Juan Ceferino. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Luis. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis; Argentina
Materia: Based on Activity
K-Means
Kennard&Ndash;Stone
Qsar
Random Selection
Rational Partition of Dataset
Nivel de accesibilidad: acceso abierto
Condiciones de uso: https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
Institución: Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador: oai:ri.conicet.gov.ar:11336/67076

Acceder

id	CONICETDig_c82740da0eafc64af71748bf5703b987
oai_identifier_str	oai:ri.conicet.gov.ar:11336/67076
network_acronym_str	CONICETDig
repository_id_str	3498
network_name_str	CONICET Digital (CONICET)
spelling	Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR modelsAndrada, Matias FernandoVega Hissi, Esteban GabrielEstrada, Mario RinaldoGarro Martinez, Juan CeferinoBased on ActivityK-MeansKennard&Ndash;StoneQsarRandom SelectionRational Partition of Datasethttps://purl.org/becyt/ford/1.4https://purl.org/becyt/ford/1This study performed an analysis of the influence of the training and test set rational selection on the quality and predictively of the quantitative structure–activity relationship (QSAR) model. The study was carried out on three different datasets of Influenza Neuraminidase (H1N1) inhibitors. The three datasets were divided into training and test sets using three rational selection methods: based on k-means, Kennard–Stone algorithm and Activity and the results were compared with Random selection. Then, a total of 31,490 mathematical models were developed and those models that presented a determination coefficient higher than: r2 train > 0.8, r2 loo > 0.7, r2 test > 0.5 and minimum standard deviation (SD) and minimum root-mean square error (RMS) were selected. The selected models were validated using the internal leave-one-out method and the predictive capacity was evaluated by the external test set. The results indicate that random selection could lead to erroneous results. In return, a rational selection allows for obtaining more reliable conclusions. The QSAR models with major predictive power were found using the k-means algorithm and selection by activity.Fil: Andrada, Matias Fernando. Universidad Nacional de San Luis. Facultad de Química, Bioquímica y Farmacia; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Vega Hissi, Esteban Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Luis. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis; ArgentinaFil: Estrada, Mario Rinaldo. Universidad Nacional de San Luis. Facultad de Química, Bioquímica y Farmacia; ArgentinaFil: Garro Martinez, Juan Ceferino. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Luis. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis; ArgentinaTaylor & Francis Ltd2017-12info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/67076Andrada, Matias Fernando; Vega Hissi, Esteban Gabriel; Estrada, Mario Rinaldo; Garro Martinez, Juan Ceferino; Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models; Taylor & Francis Ltd; Sar And Qsar In Environmental Research; 28; 12; 12-2017; 1011-10231062-936X1029-046XCONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1080/1062936X.2017.1397056info:eu-repo/semantics/altIdentifier/url/https://www.tandfonline.com/doi/full/10.1080/1062936X.2017.1397056info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-12-23T13:35:26Zoai:ri.conicet.gov.ar:11336/67076instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-12-23 13:35:27.042CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv	Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models
title	Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models
spellingShingle	Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models Andrada, Matias Fernando Based on Activity K-Means Kennard&Ndash;Stone Qsar Random Selection Rational Partition of Dataset
title_short	Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models
title_full	Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models
title_fullStr	Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models
title_full_unstemmed	Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models
title_sort	Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models
dc.creator.none.fl_str_mv	Andrada, Matias Fernando Vega Hissi, Esteban Gabriel Estrada, Mario Rinaldo Garro Martinez, Juan Ceferino
author	Andrada, Matias Fernando
author_facet	Andrada, Matias Fernando Vega Hissi, Esteban Gabriel Estrada, Mario Rinaldo Garro Martinez, Juan Ceferino
author_role	author
author2	Vega Hissi, Esteban Gabriel Estrada, Mario Rinaldo Garro Martinez, Juan Ceferino
author2_role	author author author
dc.subject.none.fl_str_mv	Based on Activity K-Means Kennard&Ndash;Stone Qsar Random Selection Rational Partition of Dataset
topic	Based on Activity K-Means Kennard&Ndash;Stone Qsar Random Selection Rational Partition of Dataset
purl_subject.fl_str_mv	https://purl.org/becyt/ford/1.4 https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv	This study performed an analysis of the influence of the training and test set rational selection on the quality and predictively of the quantitative structure–activity relationship (QSAR) model. The study was carried out on three different datasets of Influenza Neuraminidase (H1N1) inhibitors. The three datasets were divided into training and test sets using three rational selection methods: based on k-means, Kennard–Stone algorithm and Activity and the results were compared with Random selection. Then, a total of 31,490 mathematical models were developed and those models that presented a determination coefficient higher than: r2 train > 0.8, r2 loo > 0.7, r2 test > 0.5 and minimum standard deviation (SD) and minimum root-mean square error (RMS) were selected. The selected models were validated using the internal leave-one-out method and the predictive capacity was evaluated by the external test set. The results indicate that random selection could lead to erroneous results. In return, a rational selection allows for obtaining more reliable conclusions. The QSAR models with major predictive power were found using the k-means algorithm and selection by activity. Fil: Andrada, Matias Fernando. Universidad Nacional de San Luis. Facultad de Química, Bioquímica y Farmacia; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina Fil: Vega Hissi, Esteban Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Luis. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis; Argentina Fil: Estrada, Mario Rinaldo. Universidad Nacional de San Luis. Facultad de Química, Bioquímica y Farmacia; Argentina Fil: Garro Martinez, Juan Ceferino. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Luis. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis; Argentina
description	This study performed an analysis of the influence of the training and test set rational selection on the quality and predictively of the quantitative structure–activity relationship (QSAR) model. The study was carried out on three different datasets of Influenza Neuraminidase (H1N1) inhibitors. The three datasets were divided into training and test sets using three rational selection methods: based on k-means, Kennard–Stone algorithm and Activity and the results were compared with Random selection. Then, a total of 31,490 mathematical models were developed and those models that presented a determination coefficient higher than: r2 train > 0.8, r2 loo > 0.7, r2 test > 0.5 and minimum standard deviation (SD) and minimum root-mean square error (RMS) were selected. The selected models were validated using the internal leave-one-out method and the predictive capacity was evaluated by the external test set. The results indicate that random selection could lead to erroneous results. In return, a rational selection allows for obtaining more reliable conclusions. The QSAR models with major predictive power were found using the k-means algorithm and selection by activity.
publishDate	2017
dc.date.none.fl_str_mv	2017-12
dc.type.none.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo
format	article
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://hdl.handle.net/11336/67076 Andrada, Matias Fernando; Vega Hissi, Esteban Gabriel; Estrada, Mario Rinaldo; Garro Martinez, Juan Ceferino; Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models; Taylor & Francis Ltd; Sar And Qsar In Environmental Research; 28; 12; 12-2017; 1011-1023 1062-936X 1029-046X CONICET Digital CONICET
url	http://hdl.handle.net/11336/67076
identifier_str_mv	Andrada, Matias Fernando; Vega Hissi, Esteban Gabriel; Estrada, Mario Rinaldo; Garro Martinez, Juan Ceferino; Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models; Taylor & Francis Ltd; Sar And Qsar In Environmental Research; 28; 12; 12-2017; 1011-1023 1062-936X 1029-046X CONICET Digital CONICET
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/doi/10.1080/1062936X.2017.1397056 info:eu-repo/semantics/altIdentifier/url/https://www.tandfonline.com/doi/full/10.1080/1062936X.2017.1397056
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv	openAccess
rights_invalid_str_mv	https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv	application/pdf application/pdf
dc.publisher.none.fl_str_mv	Taylor & Francis Ltd
publisher.none.fl_str_mv	Taylor & Francis Ltd
dc.source.none.fl_str_mv	reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str	CONICET Digital (CONICET)
collection	CONICET Digital (CONICET)
instname_str	Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv	CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv	dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_	1852335237649924096
score	13.075124

Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models

Publicaciones similares