Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models
- Autores
- Andrada, Matias Fernando; Vega Hissi, Esteban Gabriel; Estrada, Mario Rinaldo; Garro Martinez, Juan Ceferino
- Año de publicación
- 2017
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- This study performed an analysis of the influence of the training and test set rational selection on the quality and predictively of the quantitative structure–activity relationship (QSAR) model. The study was carried out on three different datasets of Influenza Neuraminidase (H1N1) inhibitors. The three datasets were divided into training and test sets using three rational selection methods: based on k-means, Kennard–Stone algorithm and Activity and the results were compared with Random selection. Then, a total of 31,490 mathematical models were developed and those models that presented a determination coefficient higher than: r2 train > 0.8, r2 loo > 0.7, r2 test > 0.5 and minimum standard deviation (SD) and minimum root-mean square error (RMS) were selected. The selected models were validated using the internal leave-one-out method and the predictive capacity was evaluated by the external test set. The results indicate that random selection could lead to erroneous results. In return, a rational selection allows for obtaining more reliable conclusions. The QSAR models with major predictive power were found using the k-means algorithm and selection by activity.
Fil: Andrada, Matias Fernando. Universidad Nacional de San Luis. Facultad de Química, Bioquímica y Farmacia; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Vega Hissi, Esteban Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Luis. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis; Argentina
Fil: Estrada, Mario Rinaldo. Universidad Nacional de San Luis. Facultad de Química, Bioquímica y Farmacia; Argentina
Fil: Garro Martinez, Juan Ceferino. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Luis. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis; Argentina - Materia
-
Based on Activity
K-Means
Kennard&Ndash;Stone
Qsar
Random Selection
Rational Partition of Dataset - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
.jpg)
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/67076
Ver los metadatos del registro completo
| id |
CONICETDig_c82740da0eafc64af71748bf5703b987 |
|---|---|
| oai_identifier_str |
oai:ri.conicet.gov.ar:11336/67076 |
| network_acronym_str |
CONICETDig |
| repository_id_str |
3498 |
| network_name_str |
CONICET Digital (CONICET) |
| spelling |
Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR modelsAndrada, Matias FernandoVega Hissi, Esteban GabrielEstrada, Mario RinaldoGarro Martinez, Juan CeferinoBased on ActivityK-MeansKennard&Ndash;StoneQsarRandom SelectionRational Partition of Datasethttps://purl.org/becyt/ford/1.4https://purl.org/becyt/ford/1This study performed an analysis of the influence of the training and test set rational selection on the quality and predictively of the quantitative structure–activity relationship (QSAR) model. The study was carried out on three different datasets of Influenza Neuraminidase (H1N1) inhibitors. The three datasets were divided into training and test sets using three rational selection methods: based on k-means, Kennard–Stone algorithm and Activity and the results were compared with Random selection. Then, a total of 31,490 mathematical models were developed and those models that presented a determination coefficient higher than: r2 train > 0.8, r2 loo > 0.7, r2 test > 0.5 and minimum standard deviation (SD) and minimum root-mean square error (RMS) were selected. The selected models were validated using the internal leave-one-out method and the predictive capacity was evaluated by the external test set. The results indicate that random selection could lead to erroneous results. In return, a rational selection allows for obtaining more reliable conclusions. The QSAR models with major predictive power were found using the k-means algorithm and selection by activity.Fil: Andrada, Matias Fernando. Universidad Nacional de San Luis. Facultad de Química, Bioquímica y Farmacia; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Vega Hissi, Esteban Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Luis. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis; ArgentinaFil: Estrada, Mario Rinaldo. Universidad Nacional de San Luis. Facultad de Química, Bioquímica y Farmacia; ArgentinaFil: Garro Martinez, Juan Ceferino. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Luis. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis; ArgentinaTaylor & Francis Ltd2017-12info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/67076Andrada, Matias Fernando; Vega Hissi, Esteban Gabriel; Estrada, Mario Rinaldo; Garro Martinez, Juan Ceferino; Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models; Taylor & Francis Ltd; Sar And Qsar In Environmental Research; 28; 12; 12-2017; 1011-10231062-936X1029-046XCONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1080/1062936X.2017.1397056info:eu-repo/semantics/altIdentifier/url/https://www.tandfonline.com/doi/full/10.1080/1062936X.2017.1397056info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-11-12T09:40:34Zoai:ri.conicet.gov.ar:11336/67076instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-11-12 09:40:34.918CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
| dc.title.none.fl_str_mv |
Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models |
| title |
Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models |
| spellingShingle |
Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models Andrada, Matias Fernando Based on Activity K-Means Kennard&Ndash;Stone Qsar Random Selection Rational Partition of Dataset |
| title_short |
Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models |
| title_full |
Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models |
| title_fullStr |
Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models |
| title_full_unstemmed |
Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models |
| title_sort |
Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models |
| dc.creator.none.fl_str_mv |
Andrada, Matias Fernando Vega Hissi, Esteban Gabriel Estrada, Mario Rinaldo Garro Martinez, Juan Ceferino |
| author |
Andrada, Matias Fernando |
| author_facet |
Andrada, Matias Fernando Vega Hissi, Esteban Gabriel Estrada, Mario Rinaldo Garro Martinez, Juan Ceferino |
| author_role |
author |
| author2 |
Vega Hissi, Esteban Gabriel Estrada, Mario Rinaldo Garro Martinez, Juan Ceferino |
| author2_role |
author author author |
| dc.subject.none.fl_str_mv |
Based on Activity K-Means Kennard&Ndash;Stone Qsar Random Selection Rational Partition of Dataset |
| topic |
Based on Activity K-Means Kennard&Ndash;Stone Qsar Random Selection Rational Partition of Dataset |
| purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.4 https://purl.org/becyt/ford/1 |
| dc.description.none.fl_txt_mv |
This study performed an analysis of the influence of the training and test set rational selection on the quality and predictively of the quantitative structure–activity relationship (QSAR) model. The study was carried out on three different datasets of Influenza Neuraminidase (H1N1) inhibitors. The three datasets were divided into training and test sets using three rational selection methods: based on k-means, Kennard–Stone algorithm and Activity and the results were compared with Random selection. Then, a total of 31,490 mathematical models were developed and those models that presented a determination coefficient higher than: r2 train > 0.8, r2 loo > 0.7, r2 test > 0.5 and minimum standard deviation (SD) and minimum root-mean square error (RMS) were selected. The selected models were validated using the internal leave-one-out method and the predictive capacity was evaluated by the external test set. The results indicate that random selection could lead to erroneous results. In return, a rational selection allows for obtaining more reliable conclusions. The QSAR models with major predictive power were found using the k-means algorithm and selection by activity. Fil: Andrada, Matias Fernando. Universidad Nacional de San Luis. Facultad de Química, Bioquímica y Farmacia; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina Fil: Vega Hissi, Esteban Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Luis. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis; Argentina Fil: Estrada, Mario Rinaldo. Universidad Nacional de San Luis. Facultad de Química, Bioquímica y Farmacia; Argentina Fil: Garro Martinez, Juan Ceferino. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Luis. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Instituto Multidisciplinario de Investigaciones Biológicas de San Luis; Argentina |
| description |
This study performed an analysis of the influence of the training and test set rational selection on the quality and predictively of the quantitative structure–activity relationship (QSAR) model. The study was carried out on three different datasets of Influenza Neuraminidase (H1N1) inhibitors. The three datasets were divided into training and test sets using three rational selection methods: based on k-means, Kennard–Stone algorithm and Activity and the results were compared with Random selection. Then, a total of 31,490 mathematical models were developed and those models that presented a determination coefficient higher than: r2 train > 0.8, r2 loo > 0.7, r2 test > 0.5 and minimum standard deviation (SD) and minimum root-mean square error (RMS) were selected. The selected models were validated using the internal leave-one-out method and the predictive capacity was evaluated by the external test set. The results indicate that random selection could lead to erroneous results. In return, a rational selection allows for obtaining more reliable conclusions. The QSAR models with major predictive power were found using the k-means algorithm and selection by activity. |
| publishDate |
2017 |
| dc.date.none.fl_str_mv |
2017-12 |
| dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
| format |
article |
| status_str |
publishedVersion |
| dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/67076 Andrada, Matias Fernando; Vega Hissi, Esteban Gabriel; Estrada, Mario Rinaldo; Garro Martinez, Juan Ceferino; Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models; Taylor & Francis Ltd; Sar And Qsar In Environmental Research; 28; 12; 12-2017; 1011-1023 1062-936X 1029-046X CONICET Digital CONICET |
| url |
http://hdl.handle.net/11336/67076 |
| identifier_str_mv |
Andrada, Matias Fernando; Vega Hissi, Esteban Gabriel; Estrada, Mario Rinaldo; Garro Martinez, Juan Ceferino; Impact assessment of the rational selection of training and test sets on the predictive ability of QSAR models; Taylor & Francis Ltd; Sar And Qsar In Environmental Research; 28; 12; 12-2017; 1011-1023 1062-936X 1029-046X CONICET Digital CONICET |
| dc.language.none.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/doi/10.1080/1062936X.2017.1397056 info:eu-repo/semantics/altIdentifier/url/https://www.tandfonline.com/doi/full/10.1080/1062936X.2017.1397056 |
| dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
| eu_rights_str_mv |
openAccess |
| rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
| dc.format.none.fl_str_mv |
application/pdf application/pdf |
| dc.publisher.none.fl_str_mv |
Taylor & Francis Ltd |
| publisher.none.fl_str_mv |
Taylor & Francis Ltd |
| dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
| reponame_str |
CONICET Digital (CONICET) |
| collection |
CONICET Digital (CONICET) |
| instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
| repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
| repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
| _version_ |
1848597517916176384 |
| score |
13.24909 |