Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries

Autores
Hoijemberg, Pablo Ariel; Pelczer, István
Año de publicación
2018
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
A lot of time is spent by researchers in the identification of metabolites in NMR-based metabolomic studies. The usual metabolite identification starts employing public or commercial databases to match chemical shifts thought to belong to a given compound. Statistical total correlation spectroscopy (STOCSY), in use for more than a decade, speeds the process by finding statistical correlations among peaks, being able to create a better peak list as input for the database query. However, the (normally not automated) analysis becomes challenging due to the intrinsic issue of peak overlap, where correlations of more than one compound appear in the STOCSY trace. Here we present a fully automated methodology that analyzes all STOCSY traces at once (every peak is chosen as driver peak) and overcomes the peak overlap obstacle. Peak overlap detection by clustering analysis and sorting of traces (POD-CAST) first creates an overlap matrix from the STOCSY traces, then clusters the overlap traces based on their similarity and finally calculates a cumulative overlap index (COI) to account for both strong and intermediate correlations. This information is gathered in one plot to help the user identify the groups of peaks that would belong to a single molecule and perform a more reliable database query. The simultaneous examination of all traces reduces the time of analysis, compared to viewing STOCSY traces by pairs or small groups, and condenses the redundant information in the 2D STOCSY matrix into bands containing similar traces. The COI helps in the detection of overlapping peaks, which can be added to the peak list from another cross-correlated band. POD-CAST overcomes the generally overlooked and underestimated presence of overlapping peaks and it detects them to include them in the search of all compounds contributing to the peak overlap, enabling the user to accelerate the metabolite identification process with more successful database queries and searching all tentative compounds in the sample set.
Fil: Hoijemberg, Pablo Ariel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Centro de Investigaciones en Bionanociencias "Elizabeth Jares Erijman"; Argentina. University of Princeton; Estados Unidos
Fil: Pelczer, István. University of Princeton; Estados Unidos
Materia
Clustering
Complex Mixture
Correlation Matrix
Database
Identification
Metabolite
Metabolomics
Nmr
Overlap
Stocsy
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/50325

id CONICETDig_1ee744d3872128b3c57cdae80f7b9db5
oai_identifier_str oai:ri.conicet.gov.ar:11336/50325
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database QueriesHoijemberg, Pablo ArielPelczer, IstvánClusteringComplex MixtureCorrelation MatrixDatabaseIdentificationMetaboliteMetabolomicsNmrOverlapStocsyhttps://purl.org/becyt/ford/1.4https://purl.org/becyt/ford/1A lot of time is spent by researchers in the identification of metabolites in NMR-based metabolomic studies. The usual metabolite identification starts employing public or commercial databases to match chemical shifts thought to belong to a given compound. Statistical total correlation spectroscopy (STOCSY), in use for more than a decade, speeds the process by finding statistical correlations among peaks, being able to create a better peak list as input for the database query. However, the (normally not automated) analysis becomes challenging due to the intrinsic issue of peak overlap, where correlations of more than one compound appear in the STOCSY trace. Here we present a fully automated methodology that analyzes all STOCSY traces at once (every peak is chosen as driver peak) and overcomes the peak overlap obstacle. Peak overlap detection by clustering analysis and sorting of traces (POD-CAST) first creates an overlap matrix from the STOCSY traces, then clusters the overlap traces based on their similarity and finally calculates a cumulative overlap index (COI) to account for both strong and intermediate correlations. This information is gathered in one plot to help the user identify the groups of peaks that would belong to a single molecule and perform a more reliable database query. The simultaneous examination of all traces reduces the time of analysis, compared to viewing STOCSY traces by pairs or small groups, and condenses the redundant information in the 2D STOCSY matrix into bands containing similar traces. The COI helps in the detection of overlapping peaks, which can be added to the peak list from another cross-correlated band. POD-CAST overcomes the generally overlooked and underestimated presence of overlapping peaks and it detects them to include them in the search of all compounds contributing to the peak overlap, enabling the user to accelerate the metabolite identification process with more successful database queries and searching all tentative compounds in the sample set.Fil: Hoijemberg, Pablo Ariel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Centro de Investigaciones en Bionanociencias "Elizabeth Jares Erijman"; Argentina. University of Princeton; Estados UnidosFil: Pelczer, István. University of Princeton; Estados UnidosAmerican Chemical Society2018-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/50325Hoijemberg, Pablo Ariel; Pelczer, István; Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries; American Chemical Society; Journal of Proteome Research; 17; 1; 1-2018; 392-4011535-3893CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://pubs.acs.org/doi/10.1021/acs.jproteome.7b00617info:eu-repo/semantics/altIdentifier/doi/10.1021/acs.jproteome.7b00617info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:44:05Zoai:ri.conicet.gov.ar:11336/50325instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:44:05.631CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries
title Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries
spellingShingle Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries
Hoijemberg, Pablo Ariel
Clustering
Complex Mixture
Correlation Matrix
Database
Identification
Metabolite
Metabolomics
Nmr
Overlap
Stocsy
title_short Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries
title_full Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries
title_fullStr Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries
title_full_unstemmed Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries
title_sort Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries
dc.creator.none.fl_str_mv Hoijemberg, Pablo Ariel
Pelczer, István
author Hoijemberg, Pablo Ariel
author_facet Hoijemberg, Pablo Ariel
Pelczer, István
author_role author
author2 Pelczer, István
author2_role author
dc.subject.none.fl_str_mv Clustering
Complex Mixture
Correlation Matrix
Database
Identification
Metabolite
Metabolomics
Nmr
Overlap
Stocsy
topic Clustering
Complex Mixture
Correlation Matrix
Database
Identification
Metabolite
Metabolomics
Nmr
Overlap
Stocsy
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.4
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv A lot of time is spent by researchers in the identification of metabolites in NMR-based metabolomic studies. The usual metabolite identification starts employing public or commercial databases to match chemical shifts thought to belong to a given compound. Statistical total correlation spectroscopy (STOCSY), in use for more than a decade, speeds the process by finding statistical correlations among peaks, being able to create a better peak list as input for the database query. However, the (normally not automated) analysis becomes challenging due to the intrinsic issue of peak overlap, where correlations of more than one compound appear in the STOCSY trace. Here we present a fully automated methodology that analyzes all STOCSY traces at once (every peak is chosen as driver peak) and overcomes the peak overlap obstacle. Peak overlap detection by clustering analysis and sorting of traces (POD-CAST) first creates an overlap matrix from the STOCSY traces, then clusters the overlap traces based on their similarity and finally calculates a cumulative overlap index (COI) to account for both strong and intermediate correlations. This information is gathered in one plot to help the user identify the groups of peaks that would belong to a single molecule and perform a more reliable database query. The simultaneous examination of all traces reduces the time of analysis, compared to viewing STOCSY traces by pairs or small groups, and condenses the redundant information in the 2D STOCSY matrix into bands containing similar traces. The COI helps in the detection of overlapping peaks, which can be added to the peak list from another cross-correlated band. POD-CAST overcomes the generally overlooked and underestimated presence of overlapping peaks and it detects them to include them in the search of all compounds contributing to the peak overlap, enabling the user to accelerate the metabolite identification process with more successful database queries and searching all tentative compounds in the sample set.
Fil: Hoijemberg, Pablo Ariel. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Centro de Investigaciones en Bionanociencias "Elizabeth Jares Erijman"; Argentina. University of Princeton; Estados Unidos
Fil: Pelczer, István. University of Princeton; Estados Unidos
description A lot of time is spent by researchers in the identification of metabolites in NMR-based metabolomic studies. The usual metabolite identification starts employing public or commercial databases to match chemical shifts thought to belong to a given compound. Statistical total correlation spectroscopy (STOCSY), in use for more than a decade, speeds the process by finding statistical correlations among peaks, being able to create a better peak list as input for the database query. However, the (normally not automated) analysis becomes challenging due to the intrinsic issue of peak overlap, where correlations of more than one compound appear in the STOCSY trace. Here we present a fully automated methodology that analyzes all STOCSY traces at once (every peak is chosen as driver peak) and overcomes the peak overlap obstacle. Peak overlap detection by clustering analysis and sorting of traces (POD-CAST) first creates an overlap matrix from the STOCSY traces, then clusters the overlap traces based on their similarity and finally calculates a cumulative overlap index (COI) to account for both strong and intermediate correlations. This information is gathered in one plot to help the user identify the groups of peaks that would belong to a single molecule and perform a more reliable database query. The simultaneous examination of all traces reduces the time of analysis, compared to viewing STOCSY traces by pairs or small groups, and condenses the redundant information in the 2D STOCSY matrix into bands containing similar traces. The COI helps in the detection of overlapping peaks, which can be added to the peak list from another cross-correlated band. POD-CAST overcomes the generally overlooked and underestimated presence of overlapping peaks and it detects them to include them in the search of all compounds contributing to the peak overlap, enabling the user to accelerate the metabolite identification process with more successful database queries and searching all tentative compounds in the sample set.
publishDate 2018
dc.date.none.fl_str_mv 2018-01
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/50325
Hoijemberg, Pablo Ariel; Pelczer, István; Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries; American Chemical Society; Journal of Proteome Research; 17; 1; 1-2018; 392-401
1535-3893
CONICET Digital
CONICET
url http://hdl.handle.net/11336/50325
identifier_str_mv Hoijemberg, Pablo Ariel; Pelczer, István; Fast Metabolite Identification in Nuclear Magnetic Resonance Metabolomic Studies: Statistical Peak Sorting and Peak Overlap Detection for More Reliable Database Queries; American Chemical Society; Journal of Proteome Research; 17; 1; 1-2018; 392-401
1535-3893
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://pubs.acs.org/doi/10.1021/acs.jproteome.7b00617
info:eu-repo/semantics/altIdentifier/doi/10.1021/acs.jproteome.7b00617
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv American Chemical Society
publisher.none.fl_str_mv American Chemical Society
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844613387875516416
score 13.070432