Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components
- Autores
- Palma, Juliana Isabel; Pierdominici Sottile, Gustavo
- Año de publicación
- 2024
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Nonsense correlations frequently develop between independent random variables that evolve with time. Therefore, it is not surprising that they appear between the components of vectors carrying out multidimensional random walks, such as those describing the trajectories of biomolecules in molecular dynamics simulations. The existence of these correlations does not imply in itself a problem. Still, it can present a problem when the trajectories are analyzed with an algorithm such as the Principal Component Analysis (PCA) because it seeks to maximize correlations without discriminating whether they have physical origin or not. In this Article, we employ random walks occurring on multidimensional harmonic potentials to evaluate the influence of fortuitous correlations in PCA. We demonstrate that, because of them, this algorithm affords misleading results when applied to a single trajectory. The errors do not only affect the directions of the first eigenvectors and their eigenvalues, but the very definition of the molecule’s “essential space” may be wrong. Additionally, the main principal component’s probability distributions present artificial structures which do not correspond with the shape of the potential energy surface. Finally, we show that the PCA of two realistic protein models, human serum albumin and lysozyme, behave similarly to the simple harmonic models. In all cases, the problems can be mitigated and eventually eliminated by doing PCA on concatenated trajectories formed from a large enough number of individual simulations.
Fil: Palma, Juliana Isabel. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Pierdominici Sottile, Gustavo. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina - Materia
-
Molecular dynamics
Principal component analysis
Configurational entropy - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/234845
Ver los metadatos del registro completo
id |
CONICETDig_9027470a1fdc2af3747f502aa0dc98d7 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/234845 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal ComponentsPalma, Juliana IsabelPierdominici Sottile, GustavoMolecular dynamicsPrincipal component analysisConfigurational entropyhttps://purl.org/becyt/ford/1.4https://purl.org/becyt/ford/1Nonsense correlations frequently develop between independent random variables that evolve with time. Therefore, it is not surprising that they appear between the components of vectors carrying out multidimensional random walks, such as those describing the trajectories of biomolecules in molecular dynamics simulations. The existence of these correlations does not imply in itself a problem. Still, it can present a problem when the trajectories are analyzed with an algorithm such as the Principal Component Analysis (PCA) because it seeks to maximize correlations without discriminating whether they have physical origin or not. In this Article, we employ random walks occurring on multidimensional harmonic potentials to evaluate the influence of fortuitous correlations in PCA. We demonstrate that, because of them, this algorithm affords misleading results when applied to a single trajectory. The errors do not only affect the directions of the first eigenvectors and their eigenvalues, but the very definition of the molecule’s “essential space” may be wrong. Additionally, the main principal component’s probability distributions present artificial structures which do not correspond with the shape of the potential energy surface. Finally, we show that the PCA of two realistic protein models, human serum albumin and lysozyme, behave similarly to the simple harmonic models. In all cases, the problems can be mitigated and eventually eliminated by doing PCA on concatenated trajectories formed from a large enough number of individual simulations.Fil: Palma, Juliana Isabel. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Pierdominici Sottile, Gustavo. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaAmerican Chemical Society2024-04info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/234845Palma, Juliana Isabel; Pierdominici Sottile, Gustavo; Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components; American Chemical Society; ACS Omega; 4-2024; 1-142470-1343CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://pubs.acs.org/doi/10.1021/acsomega.4c01515info:eu-repo/semantics/altIdentifier/doi/10.1021/acsomega.4c01515info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-nd/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-03T09:57:43Zoai:ri.conicet.gov.ar:11336/234845instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-03 09:57:44.05CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components |
title |
Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components |
spellingShingle |
Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components Palma, Juliana Isabel Molecular dynamics Principal component analysis Configurational entropy |
title_short |
Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components |
title_full |
Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components |
title_fullStr |
Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components |
title_full_unstemmed |
Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components |
title_sort |
Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components |
dc.creator.none.fl_str_mv |
Palma, Juliana Isabel Pierdominici Sottile, Gustavo |
author |
Palma, Juliana Isabel |
author_facet |
Palma, Juliana Isabel Pierdominici Sottile, Gustavo |
author_role |
author |
author2 |
Pierdominici Sottile, Gustavo |
author2_role |
author |
dc.subject.none.fl_str_mv |
Molecular dynamics Principal component analysis Configurational entropy |
topic |
Molecular dynamics Principal component analysis Configurational entropy |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.4 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
Nonsense correlations frequently develop between independent random variables that evolve with time. Therefore, it is not surprising that they appear between the components of vectors carrying out multidimensional random walks, such as those describing the trajectories of biomolecules in molecular dynamics simulations. The existence of these correlations does not imply in itself a problem. Still, it can present a problem when the trajectories are analyzed with an algorithm such as the Principal Component Analysis (PCA) because it seeks to maximize correlations without discriminating whether they have physical origin or not. In this Article, we employ random walks occurring on multidimensional harmonic potentials to evaluate the influence of fortuitous correlations in PCA. We demonstrate that, because of them, this algorithm affords misleading results when applied to a single trajectory. The errors do not only affect the directions of the first eigenvectors and their eigenvalues, but the very definition of the molecule’s “essential space” may be wrong. Additionally, the main principal component’s probability distributions present artificial structures which do not correspond with the shape of the potential energy surface. Finally, we show that the PCA of two realistic protein models, human serum albumin and lysozyme, behave similarly to the simple harmonic models. In all cases, the problems can be mitigated and eventually eliminated by doing PCA on concatenated trajectories formed from a large enough number of individual simulations. Fil: Palma, Juliana Isabel. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina Fil: Pierdominici Sottile, Gustavo. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina |
description |
Nonsense correlations frequently develop between independent random variables that evolve with time. Therefore, it is not surprising that they appear between the components of vectors carrying out multidimensional random walks, such as those describing the trajectories of biomolecules in molecular dynamics simulations. The existence of these correlations does not imply in itself a problem. Still, it can present a problem when the trajectories are analyzed with an algorithm such as the Principal Component Analysis (PCA) because it seeks to maximize correlations without discriminating whether they have physical origin or not. In this Article, we employ random walks occurring on multidimensional harmonic potentials to evaluate the influence of fortuitous correlations in PCA. We demonstrate that, because of them, this algorithm affords misleading results when applied to a single trajectory. The errors do not only affect the directions of the first eigenvectors and their eigenvalues, but the very definition of the molecule’s “essential space” may be wrong. Additionally, the main principal component’s probability distributions present artificial structures which do not correspond with the shape of the potential energy surface. Finally, we show that the PCA of two realistic protein models, human serum albumin and lysozyme, behave similarly to the simple harmonic models. In all cases, the problems can be mitigated and eventually eliminated by doing PCA on concatenated trajectories formed from a large enough number of individual simulations. |
publishDate |
2024 |
dc.date.none.fl_str_mv |
2024-04 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/234845 Palma, Juliana Isabel; Pierdominici Sottile, Gustavo; Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components; American Chemical Society; ACS Omega; 4-2024; 1-14 2470-1343 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/234845 |
identifier_str_mv |
Palma, Juliana Isabel; Pierdominici Sottile, Gustavo; Fortuitous Correlations in Molecular Dynamics Simulations: Their Harmful Influence on the Probability Distributions of the Main Principal Components; American Chemical Society; ACS Omega; 4-2024; 1-14 2470-1343 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://pubs.acs.org/doi/10.1021/acsomega.4c01515 info:eu-repo/semantics/altIdentifier/doi/10.1021/acsomega.4c01515 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-nd/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
American Chemical Society |
publisher.none.fl_str_mv |
American Chemical Society |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1842269480395210752 |
score |
13.13397 |