Application of machine learning to predict unbound drug bioavailability in the brain

Autores
Morales, Juan Francisco; Ruiz, María Esperanza; Stratford, Robert E.; Talevi, Alan
Año de publicación
2024
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Purpose: Optimizing brain bioavailability is highly relevant for the development of drugs targeting the central nervous system. Several pharmacokinetic parameters have been used for measuring drug bioavailability in the brain. The most biorelevant among them is possibly the unbound brain-to-plasma partition coefficient, Kpuu,brain,ss, which relates unbound brain and plasma drug concentrations under steady-state conditions. In this study, we developed new in silico models to predict Kpuu,brain,ss. Methods: A manually curated 157-compound dataset was compiled from literature and split into training and test sets using a clustering approach. Additional models were trained with a refined dataset generated by removing known P-gp and/or Breast Cancer Resistance Protein substrates from the original dataset. Different supervised machine learning algorithms have been tested, including Support Vector Machine, Gradient Boosting Machine, k-nearest neighbors, classificatory Partial Least Squares, Random Forest, Extreme Gradient Boosting, Deep Learning and Linear Discriminant Analysis. Good practices of predictive Quantitative Structure-Activity Relationships modeling were followed for the development of the models. Results: The best performance in the complete dataset was achieved by extreme gradient boosting, with an accuracy in the test set of 85.1%. A similar estimation of accuracy was observed in a prospective validation experiment, using a small sample of compounds and comparing predicted unbound brain bioavailability with observed experimental data. Conclusion: New in silico models were developed to predict the Kpuu,brain,ss of drug candidates. The dataset used in this study is publicly disclosed, so that the models may be reproduced, refined, or expanded, as a useful tool to assist drug discovery processes.
Laboratorio de Investigación y Desarrollo de Bioactivos
Materia
Biología
ADME properties
blood-brain barrier
brain bioavailability
central nervous system
machine learning
pharmacokinetics modeling
artificial intelligence
unbound partition coefficient
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/167341

id SEDICI_593916e55d9a717dad0ca27bfe226c79
oai_identifier_str oai:sedici.unlp.edu.ar:10915/167341
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Application of machine learning to predict unbound drug bioavailability in the brainMorales, Juan FranciscoRuiz, María EsperanzaStratford, Robert E.Talevi, AlanBiologíaADME propertiesblood-brain barrierbrain bioavailabilitycentral nervous systemmachine learningpharmacokinetics modelingartificial intelligenceunbound partition coefficientPurpose: Optimizing brain bioavailability is highly relevant for the development of drugs targeting the central nervous system. Several pharmacokinetic parameters have been used for measuring drug bioavailability in the brain. The most biorelevant among them is possibly the unbound brain-to-plasma partition coefficient, Kpuu,brain,ss, which relates unbound brain and plasma drug concentrations under steady-state conditions. In this study, we developed new in silico models to predict Kpuu,brain,ss. Methods: A manually curated 157-compound dataset was compiled from literature and split into training and test sets using a clustering approach. Additional models were trained with a refined dataset generated by removing known P-gp and/or Breast Cancer Resistance Protein substrates from the original dataset. Different supervised machine learning algorithms have been tested, including Support Vector Machine, Gradient Boosting Machine, k-nearest neighbors, classificatory Partial Least Squares, Random Forest, Extreme Gradient Boosting, Deep Learning and Linear Discriminant Analysis. Good practices of predictive Quantitative Structure-Activity Relationships modeling were followed for the development of the models. Results: The best performance in the complete dataset was achieved by extreme gradient boosting, with an accuracy in the test set of 85.1%. A similar estimation of accuracy was observed in a prospective validation experiment, using a small sample of compounds and comparing predicted unbound brain bioavailability with observed experimental data. Conclusion: New in silico models were developed to predict the Kpuu,brain,ss of drug candidates. The dataset used in this study is publicly disclosed, so that the models may be reproduced, refined, or expanded, as a useful tool to assist drug discovery processes.Laboratorio de Investigación y Desarrollo de Bioactivos2024info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionArticulohttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/167341enginfo:eu-repo/semantics/altIdentifier/issn/2674-0338info:eu-repo/semantics/altIdentifier/doi/10.3389/fddsv.2024.1360732info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by/4.0/Creative Commons Attribution 4.0 International (CC BY 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2026-01-07T13:28:41Zoai:sedici.unlp.edu.ar:10915/167341Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292026-01-07 13:28:41.996SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Application of machine learning to predict unbound drug bioavailability in the brain
title Application of machine learning to predict unbound drug bioavailability in the brain
spellingShingle Application of machine learning to predict unbound drug bioavailability in the brain
Morales, Juan Francisco
Biología
ADME properties
blood-brain barrier
brain bioavailability
central nervous system
machine learning
pharmacokinetics modeling
artificial intelligence
unbound partition coefficient
title_short Application of machine learning to predict unbound drug bioavailability in the brain
title_full Application of machine learning to predict unbound drug bioavailability in the brain
title_fullStr Application of machine learning to predict unbound drug bioavailability in the brain
title_full_unstemmed Application of machine learning to predict unbound drug bioavailability in the brain
title_sort Application of machine learning to predict unbound drug bioavailability in the brain
dc.creator.none.fl_str_mv Morales, Juan Francisco
Ruiz, María Esperanza
Stratford, Robert E.
Talevi, Alan
author Morales, Juan Francisco
author_facet Morales, Juan Francisco
Ruiz, María Esperanza
Stratford, Robert E.
Talevi, Alan
author_role author
author2 Ruiz, María Esperanza
Stratford, Robert E.
Talevi, Alan
author2_role author
author
author
dc.subject.none.fl_str_mv Biología
ADME properties
blood-brain barrier
brain bioavailability
central nervous system
machine learning
pharmacokinetics modeling
artificial intelligence
unbound partition coefficient
topic Biología
ADME properties
blood-brain barrier
brain bioavailability
central nervous system
machine learning
pharmacokinetics modeling
artificial intelligence
unbound partition coefficient
dc.description.none.fl_txt_mv Purpose: Optimizing brain bioavailability is highly relevant for the development of drugs targeting the central nervous system. Several pharmacokinetic parameters have been used for measuring drug bioavailability in the brain. The most biorelevant among them is possibly the unbound brain-to-plasma partition coefficient, Kpuu,brain,ss, which relates unbound brain and plasma drug concentrations under steady-state conditions. In this study, we developed new in silico models to predict Kpuu,brain,ss. Methods: A manually curated 157-compound dataset was compiled from literature and split into training and test sets using a clustering approach. Additional models were trained with a refined dataset generated by removing known P-gp and/or Breast Cancer Resistance Protein substrates from the original dataset. Different supervised machine learning algorithms have been tested, including Support Vector Machine, Gradient Boosting Machine, k-nearest neighbors, classificatory Partial Least Squares, Random Forest, Extreme Gradient Boosting, Deep Learning and Linear Discriminant Analysis. Good practices of predictive Quantitative Structure-Activity Relationships modeling were followed for the development of the models. Results: The best performance in the complete dataset was achieved by extreme gradient boosting, with an accuracy in the test set of 85.1%. A similar estimation of accuracy was observed in a prospective validation experiment, using a small sample of compounds and comparing predicted unbound brain bioavailability with observed experimental data. Conclusion: New in silico models were developed to predict the Kpuu,brain,ss of drug candidates. The dataset used in this study is publicly disclosed, so that the models may be reproduced, refined, or expanded, as a useful tool to assist drug discovery processes.
Laboratorio de Investigación y Desarrollo de Bioactivos
description Purpose: Optimizing brain bioavailability is highly relevant for the development of drugs targeting the central nervous system. Several pharmacokinetic parameters have been used for measuring drug bioavailability in the brain. The most biorelevant among them is possibly the unbound brain-to-plasma partition coefficient, Kpuu,brain,ss, which relates unbound brain and plasma drug concentrations under steady-state conditions. In this study, we developed new in silico models to predict Kpuu,brain,ss. Methods: A manually curated 157-compound dataset was compiled from literature and split into training and test sets using a clustering approach. Additional models were trained with a refined dataset generated by removing known P-gp and/or Breast Cancer Resistance Protein substrates from the original dataset. Different supervised machine learning algorithms have been tested, including Support Vector Machine, Gradient Boosting Machine, k-nearest neighbors, classificatory Partial Least Squares, Random Forest, Extreme Gradient Boosting, Deep Learning and Linear Discriminant Analysis. Good practices of predictive Quantitative Structure-Activity Relationships modeling were followed for the development of the models. Results: The best performance in the complete dataset was achieved by extreme gradient boosting, with an accuracy in the test set of 85.1%. A similar estimation of accuracy was observed in a prospective validation experiment, using a small sample of compounds and comparing predicted unbound brain bioavailability with observed experimental data. Conclusion: New in silico models were developed to predict the Kpuu,brain,ss of drug candidates. The dataset used in this study is publicly disclosed, so that the models may be reproduced, refined, or expanded, as a useful tool to assist drug discovery processes.
publishDate 2024
dc.date.none.fl_str_mv 2024
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
Articulo
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/167341
url http://sedici.unlp.edu.ar/handle/10915/167341
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/issn/2674-0338
info:eu-repo/semantics/altIdentifier/doi/10.3389/fddsv.2024.1360732
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by/4.0/
Creative Commons Attribution 4.0 International (CC BY 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by/4.0/
Creative Commons Attribution 4.0 International (CC BY 4.0)
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1853683279325036544
score 13.25844