Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries
- Autores
- Hoijemberg, Pablo Ariel; Pelczer, István
- Año de publicación
- 2018
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- A lot of time is spent by researchers in the identification of metabolites in NMR-based metabolomic studies. The usual metabolite identification starts employing public or commercial databases to match chemical shifts thought to belong to a given compound. Statistical total correlation spectroscopy (STOCSY), in use for more than a decade, speeds the process by finding statistical correlations among peaks, being able to create a better peak list as input for the database query. However, the (normally not automated) analysis becomes challenging due to the intrinsic issue of peak overlap, where correlations of more than one compound appear in the STOCSY trace. Here we present a fully automated methodology that analyzes all STOCSY traces at once (every peak is chosen as driver peak) and overcomes the peak overlap obstacle. Peak overlap detection by clustering analysis and sorting of traces (POD-CAST) first creates an overlap matrix from the STOCSY traces, then clusters the overlap traces based on their similarity and finally calculates a cumulative overlap index (COI) to account for both strong and intermediate correlations. This information is gathered in one plot to help the user identify the groups of peaks that would belong to a single molecule and perform a more reliable database query. The simultaneous examination of all traces reduces the time of analysis, compared to viewing STOCSY traces by pairs or small groups, and condenses the redundant information in the 2D STOCSY matrix into bands containing similar traces. The COI helps in the detection of overlapping peaks, which can be added to the peak list from another cross-correlated band. POD-CAST overcomes the generally overlooked and underestimated presence of overlapping peaks and it detects them to include them in the search of all compounds contributing to the peak overlap, enabling the user to accelerate the metabolite identification process with more successful database queries and searching all tentative compounds in the sample set.
Fil: Hoijemberg, Pablo Ariel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Centro de Investigaciones en Bionanociencias "Elizabeth Jares Erijman"; Argentina. University of Princeton; Estados Unidos
Fil: Pelczer, István. University of Princeton; Estados Unidos - Materia
-
Clustering
Complex Mixture
Correlation Matrix
Database
Identification
Metabolite
Metabolomics
Nmr
Overlap
Stocsy - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/50325
Ver los metadatos del registro completo
id |
CONICETDig_1ee744d3872128b3c57cdae80f7b9db5 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/50325 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database QueriesHoijemberg, Pablo ArielPelczer, IstvánClusteringComplex MixtureCorrelation MatrixDatabaseIdentificationMetaboliteMetabolomicsNmrOverlapStocsyhttps://purl.org/becyt/ford/1.4https://purl.org/becyt/ford/1A lot of time is spent by researchers in the identification of metabolites in NMR-based metabolomic studies. The usual metabolite identification starts employing public or commercial databases to match chemical shifts thought to belong to a given compound. Statistical total correlation spectroscopy (STOCSY), in use for more than a decade, speeds the process by finding statistical correlations among peaks, being able to create a better peak list as input for the database query. However, the (normally not automated) analysis becomes challenging due to the intrinsic issue of peak overlap, where correlations of more than one compound appear in the STOCSY trace. Here we present a fully automated methodology that analyzes all STOCSY traces at once (every peak is chosen as driver peak) and overcomes the peak overlap obstacle. Peak overlap detection by clustering analysis and sorting of traces (POD-CAST) first creates an overlap matrix from the STOCSY traces, then clusters the overlap traces based on their similarity and finally calculates a cumulative overlap index (COI) to account for both strong and intermediate correlations. This information is gathered in one plot to help the user identify the groups of peaks that would belong to a single molecule and perform a more reliable database query. The simultaneous examination of all traces reduces the time of analysis, compared to viewing STOCSY traces by pairs or small groups, and condenses the redundant information in the 2D STOCSY matrix into bands containing similar traces. The COI helps in the detection of overlapping peaks, which can be added to the peak list from another cross-correlated band. POD-CAST overcomes the generally overlooked and underestimated presence of overlapping peaks and it detects them to include them in the search of all compounds contributing to the peak overlap, enabling the user to accelerate the metabolite identification process with more successful database queries and searching all tentative compounds in the sample set.Fil: Hoijemberg, Pablo Ariel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Centro de Investigaciones en Bionanociencias "Elizabeth Jares Erijman"; Argentina. University of Princeton; Estados UnidosFil: Pelczer, István. University of Princeton; Estados UnidosAmerican Chemical Society2018-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/50325Hoijemberg, Pablo Ariel; Pelczer, István; Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries; American Chemical Society; Journal of Proteome Research; 17; 1; 1-2018; 392-4011535-3893CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://pubs.acs.org/doi/10.1021/acs.jproteome.7b00617info:eu-repo/semantics/altIdentifier/doi/10.1021/acs.jproteome.7b00617info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:44:05Zoai:ri.conicet.gov.ar:11336/50325instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:44:05.631CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries |
title |
Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries |
spellingShingle |
Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries Hoijemberg, Pablo Ariel Clustering Complex Mixture Correlation Matrix Database Identification Metabolite Metabolomics Nmr Overlap Stocsy |
title_short |
Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries |
title_full |
Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries |
title_fullStr |
Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries |
title_full_unstemmed |
Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries |
title_sort |
Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries |
dc.creator.none.fl_str_mv |
Hoijemberg, Pablo Ariel Pelczer, István |
author |
Hoijemberg, Pablo Ariel |
author_facet |
Hoijemberg, Pablo Ariel Pelczer, István |
author_role |
author |
author2 |
Pelczer, István |
author2_role |
author |
dc.subject.none.fl_str_mv |
Clustering Complex Mixture Correlation Matrix Database Identification Metabolite Metabolomics Nmr Overlap Stocsy |
topic |
Clustering Complex Mixture Correlation Matrix Database Identification Metabolite Metabolomics Nmr Overlap Stocsy |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.4 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
A lot of time is spent by researchers in the identification of metabolites in NMR-based metabolomic studies. The usual metabolite identification starts employing public or commercial databases to match chemical shifts thought to belong to a given compound. Statistical total correlation spectroscopy (STOCSY), in use for more than a decade, speeds the process by finding statistical correlations among peaks, being able to create a better peak list as input for the database query. However, the (normally not automated) analysis becomes challenging due to the intrinsic issue of peak overlap, where correlations of more than one compound appear in the STOCSY trace. Here we present a fully automated methodology that analyzes all STOCSY traces at once (every peak is chosen as driver peak) and overcomes the peak overlap obstacle. Peak overlap detection by clustering analysis and sorting of traces (POD-CAST) first creates an overlap matrix from the STOCSY traces, then clusters the overlap traces based on their similarity and finally calculates a cumulative overlap index (COI) to account for both strong and intermediate correlations. This information is gathered in one plot to help the user identify the groups of peaks that would belong to a single molecule and perform a more reliable database query. The simultaneous examination of all traces reduces the time of analysis, compared to viewing STOCSY traces by pairs or small groups, and condenses the redundant information in the 2D STOCSY matrix into bands containing similar traces. The COI helps in the detection of overlapping peaks, which can be added to the peak list from another cross-correlated band. POD-CAST overcomes the generally overlooked and underestimated presence of overlapping peaks and it detects them to include them in the search of all compounds contributing to the peak overlap, enabling the user to accelerate the metabolite identification process with more successful database queries and searching all tentative compounds in the sample set. Fil: Hoijemberg, Pablo Ariel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Centro de Investigaciones en Bionanociencias "Elizabeth Jares Erijman"; Argentina. University of Princeton; Estados Unidos Fil: Pelczer, István. University of Princeton; Estados Unidos |
description |
A lot of time is spent by researchers in the identification of metabolites in NMR-based metabolomic studies. The usual metabolite identification starts employing public or commercial databases to match chemical shifts thought to belong to a given compound. Statistical total correlation spectroscopy (STOCSY), in use for more than a decade, speeds the process by finding statistical correlations among peaks, being able to create a better peak list as input for the database query. However, the (normally not automated) analysis becomes challenging due to the intrinsic issue of peak overlap, where correlations of more than one compound appear in the STOCSY trace. Here we present a fully automated methodology that analyzes all STOCSY traces at once (every peak is chosen as driver peak) and overcomes the peak overlap obstacle. Peak overlap detection by clustering analysis and sorting of traces (POD-CAST) first creates an overlap matrix from the STOCSY traces, then clusters the overlap traces based on their similarity and finally calculates a cumulative overlap index (COI) to account for both strong and intermediate correlations. This information is gathered in one plot to help the user identify the groups of peaks that would belong to a single molecule and perform a more reliable database query. The simultaneous examination of all traces reduces the time of analysis, compared to viewing STOCSY traces by pairs or small groups, and condenses the redundant information in the 2D STOCSY matrix into bands containing similar traces. The COI helps in the detection of overlapping peaks, which can be added to the peak list from another cross-correlated band. POD-CAST overcomes the generally overlooked and underestimated presence of overlapping peaks and it detects them to include them in the search of all compounds contributing to the peak overlap, enabling the user to accelerate the metabolite identification process with more successful database queries and searching all tentative compounds in the sample set. |
publishDate |
2018 |
dc.date.none.fl_str_mv |
2018-01 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/50325 Hoijemberg, Pablo Ariel; Pelczer, István; Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries; American Chemical Society; Journal of Proteome Research; 17; 1; 1-2018; 392-401 1535-3893 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/50325 |
identifier_str_mv |
Hoijemberg, Pablo Ariel; Pelczer, István; Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries; American Chemical Society; Journal of Proteome Research; 17; 1; 1-2018; 392-401 1535-3893 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://pubs.acs.org/doi/10.1021/acs.jproteome.7b00617 info:eu-repo/semantics/altIdentifier/doi/10.1021/acs.jproteome.7b00617 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
American Chemical Society |
publisher.none.fl_str_mv |
American Chemical Society |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844613387875516416 |
score |
13.070432 |