Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components

Autores
Palma, Juliana Isabel; Pierdominici Sottile, Gustavo
Año de publicación
2024
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Nonsense correlations frequently develop between independent random variables that evolve with time. Therefore, it is not surprising that they appear between the components of vectors carrying out multidimensional random walks, such as those describing the trajectories of biomolecules in molecular dynamics simulations. The existence of these correlations does not imply in itself a problem. Still, it can present a problem when the trajectories are analyzed with an algorithm such as the Principal Component Analysis (PCA) because it seeks to maximize correlations without discriminating whether they have physical origin or not. In this Article, we employ random walks occurring on multidimensional harmonic potentials to evaluate the influence of fortuitous correlations in PCA. We demonstrate that, because of them, this algorithm affords misleading results when applied to a single trajectory. The errors do not only affect the directions of the first eigenvectors and their eigenvalues, but the very definition of the molecule’s “essential space” may be wrong. Additionally, the main principal component’s probability distributions present artificial structures which do not correspond with the shape of the potential energy surface. Finally, we show that the PCA of two realistic protein models, human serum albumin and lysozyme, behave similarly to the simple harmonic models. In all cases, the problems can be mitigated and eventually eliminated by doing PCA on concatenated trajectories formed from a large enough number of individual simulations.
Fil: Palma, Juliana Isabel. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Pierdominici Sottile, Gustavo. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Materia
Molecular dynamics
Principal component analysis
Configurational entropy
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/234845

id CONICETDig_9027470a1fdc2af3747f502aa0dc98d7
oai_identifier_str oai:ri.conicet.gov.ar:11336/234845
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal ComponentsPalma, Juliana IsabelPierdominici Sottile, GustavoMolecular dynamicsPrincipal component analysisConfigurational entropyhttps://purl.org/becyt/ford/1.4https://purl.org/becyt/ford/1Nonsense correlations frequently develop between independent random variables that evolve with time. Therefore, it is not surprising that they appear between the components of vectors carrying out multidimensional random walks, such as those describing the trajectories of biomolecules in molecular dynamics simulations. The existence of these correlations does not imply in itself a problem. Still, it can present a problem when the trajectories are analyzed with an algorithm such as the Principal Component Analysis (PCA) because it seeks to maximize correlations without discriminating whether they have physical origin or not. In this Article, we employ random walks occurring on multidimensional harmonic potentials to evaluate the influence of fortuitous correlations in PCA. We demonstrate that, because of them, this algorithm affords misleading results when applied to a single trajectory. The errors do not only affect the directions of the first eigenvectors and their eigenvalues, but the very definition of the molecule’s “essential space” may be wrong. Additionally, the main principal component’s probability distributions present artificial structures which do not correspond with the shape of the potential energy surface. Finally, we show that the PCA of two realistic protein models, human serum albumin and lysozyme, behave similarly to the simple harmonic models. In all cases, the problems can be mitigated and eventually eliminated by doing PCA on concatenated trajectories formed from a large enough number of individual simulations.Fil: Palma, Juliana Isabel. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Pierdominici Sottile, Gustavo. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaAmerican Chemical Society2024-04info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/234845Palma, Juliana Isabel; Pierdominici Sottile, Gustavo; Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components; American Chemical Society; ACS Omega; 4-2024; 1-142470-1343CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://pubs.acs.org/doi/10.1021/acsomega.4c01515info:eu-repo/semantics/altIdentifier/doi/10.1021/acsomega.4c01515info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-nd/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-03T09:57:43Zoai:ri.conicet.gov.ar:11336/234845instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-03 09:57:44.05CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components
title Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components
spellingShingle Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components
Palma, Juliana Isabel
Molecular dynamics
Principal component analysis
Configurational entropy
title_short Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components
title_full Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components
title_fullStr Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components
title_full_unstemmed Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components
title_sort Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components
dc.creator.none.fl_str_mv Palma, Juliana Isabel
Pierdominici Sottile, Gustavo
author Palma, Juliana Isabel
author_facet Palma, Juliana Isabel
Pierdominici Sottile, Gustavo
author_role author
author2 Pierdominici Sottile, Gustavo
author2_role author
dc.subject.none.fl_str_mv Molecular dynamics
Principal component analysis
Configurational entropy
topic Molecular dynamics
Principal component analysis
Configurational entropy
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.4
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Nonsense correlations frequently develop between independent random variables that evolve with time. Therefore, it is not surprising that they appear between the components of vectors carrying out multidimensional random walks, such as those describing the trajectories of biomolecules in molecular dynamics simulations. The existence of these correlations does not imply in itself a problem. Still, it can present a problem when the trajectories are analyzed with an algorithm such as the Principal Component Analysis (PCA) because it seeks to maximize correlations without discriminating whether they have physical origin or not. In this Article, we employ random walks occurring on multidimensional harmonic potentials to evaluate the influence of fortuitous correlations in PCA. We demonstrate that, because of them, this algorithm affords misleading results when applied to a single trajectory. The errors do not only affect the directions of the first eigenvectors and their eigenvalues, but the very definition of the molecule’s “essential space” may be wrong. Additionally, the main principal component’s probability distributions present artificial structures which do not correspond with the shape of the potential energy surface. Finally, we show that the PCA of two realistic protein models, human serum albumin and lysozyme, behave similarly to the simple harmonic models. In all cases, the problems can be mitigated and eventually eliminated by doing PCA on concatenated trajectories formed from a large enough number of individual simulations.
Fil: Palma, Juliana Isabel. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Pierdominici Sottile, Gustavo. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
description Nonsense correlations frequently develop between independent random variables that evolve with time. Therefore, it is not surprising that they appear between the components of vectors carrying out multidimensional random walks, such as those describing the trajectories of biomolecules in molecular dynamics simulations. The existence of these correlations does not imply in itself a problem. Still, it can present a problem when the trajectories are analyzed with an algorithm such as the Principal Component Analysis (PCA) because it seeks to maximize correlations without discriminating whether they have physical origin or not. In this Article, we employ random walks occurring on multidimensional harmonic potentials to evaluate the influence of fortuitous correlations in PCA. We demonstrate that, because of them, this algorithm affords misleading results when applied to a single trajectory. The errors do not only affect the directions of the first eigenvectors and their eigenvalues, but the very definition of the molecule’s “essential space” may be wrong. Additionally, the main principal component’s probability distributions present artificial structures which do not correspond with the shape of the potential energy surface. Finally, we show that the PCA of two realistic protein models, human serum albumin and lysozyme, behave similarly to the simple harmonic models. In all cases, the problems can be mitigated and eventually eliminated by doing PCA on concatenated trajectories formed from a large enough number of individual simulations.
publishDate 2024
dc.date.none.fl_str_mv 2024-04
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/234845
Palma, Juliana Isabel; Pierdominici Sottile, Gustavo; Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components; American Chemical Society; ACS Omega; 4-2024; 1-14
2470-1343
CONICET Digital
CONICET
url http://hdl.handle.net/11336/234845
identifier_str_mv Palma, Juliana Isabel; Pierdominici Sottile, Gustavo; Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components; American Chemical Society; ACS Omega; 4-2024; 1-14
2470-1343
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://pubs.acs.org/doi/10.1021/acsomega.4c01515
info:eu-repo/semantics/altIdentifier/doi/10.1021/acsomega.4c01515
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv American Chemical Society
publisher.none.fl_str_mv American Chemical Society
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1842269480395210752
score 13.13397