An efficient, not-only-linear correlation coefficient based on clustering

Autores
Pividori, Milton Damián; Ritchie, Marylyn D.; Milone, Diego Humberto; Greene, Casey S.
Año de publicación
2024
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Identifying meaningful patterns in data is crucial for understanding complex biological processes, particularly in transcriptomics, where genes with correlated expression often share functions or contribute to disease mechanisms. Traditional correlation coefficients, which primarily capture linear relationships, may overlook important nonlinear patterns. We introduce the clustermatch correlation coefficient (CCC), a not-only-linear coefficient that utilizes clustering to efficiently detect both linear and nonlinear associations. CCC outperforms standard methods by revealing biologically meaningful patterns that linear-only coefficients miss and is faster than state-of-the-art coefficients such as the maximal information coefficient. When applied to human gene expression data from genotype-tissue expression (GTEx), CCC identified robust linear relationships and nonlinear patterns, such as sex-specific differences, that are undetectable by standard methods. Highly ranked gene pairs were enriched for interactions in integrated networks built from protein-protein interactions, transcription factor regulation, and chemical and genetic perturbations, suggesting that CCC can detect functional relationships missed by linear-only approaches. CCC is a highly efficient, next-generation, not-only-linear correlation coefficient for genome-scale data. A record of this paper’s transparent peer review process is included in the supplemental information.
Fil: Pividori, Milton Damián. University of Colorado; Estados Unidos. University of Pennsylvania; Estados Unidos. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Ritchie, Marylyn D.. University of Pennsylvania; Estados Unidos
Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Greene, Casey S.. University of Colorado; Estados Unidos
Materia
correlation coefficient
nonlinear relationships
clustering
gene expression
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/258282

id CONICETDig_9f5bdd9eb2e708c944566aa1ccbb5c7a
oai_identifier_str oai:ri.conicet.gov.ar:11336/258282
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling An efficient, not-only-linear correlation coefficient based on clusteringPividori, Milton DamiánRitchie, Marylyn D.Milone, Diego HumbertoGreene, Casey S.correlation coefficientnonlinear relationshipsclusteringgene expressionhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Identifying meaningful patterns in data is crucial for understanding complex biological processes, particularly in transcriptomics, where genes with correlated expression often share functions or contribute to disease mechanisms. Traditional correlation coefficients, which primarily capture linear relationships, may overlook important nonlinear patterns. We introduce the clustermatch correlation coefficient (CCC), a not-only-linear coefficient that utilizes clustering to efficiently detect both linear and nonlinear associations. CCC outperforms standard methods by revealing biologically meaningful patterns that linear-only coefficients miss and is faster than state-of-the-art coefficients such as the maximal information coefficient. When applied to human gene expression data from genotype-tissue expression (GTEx), CCC identified robust linear relationships and nonlinear patterns, such as sex-specific differences, that are undetectable by standard methods. Highly ranked gene pairs were enriched for interactions in integrated networks built from protein-protein interactions, transcription factor regulation, and chemical and genetic perturbations, suggesting that CCC can detect functional relationships missed by linear-only approaches. CCC is a highly efficient, next-generation, not-only-linear correlation coefficient for genome-scale data. A record of this paper’s transparent peer review process is included in the supplemental information.Fil: Pividori, Milton Damián. University of Colorado; Estados Unidos. University of Pennsylvania; Estados Unidos. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Ritchie, Marylyn D.. University of Pennsylvania; Estados UnidosFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Greene, Casey S.. University of Colorado; Estados UnidosCell Press2024-09info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/258282Pividori, Milton Damián; Ritchie, Marylyn D.; Milone, Diego Humberto; Greene, Casey S.; An efficient, not-only-linear correlation coefficient based on clustering; Cell Press; Cell Systems; 15; 9; 9-2024; 854-868.e32405-4712CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://linkinghub.elsevier.com/retrieve/pii/S2405471224002357info:eu-repo/semantics/altIdentifier/doi/10.1016/j.cels.2024.08.005info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:33:52Zoai:ri.conicet.gov.ar:11336/258282instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:33:53.032CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv An efficient, not-only-linear correlation coefficient based on clustering
title An efficient, not-only-linear correlation coefficient based on clustering
spellingShingle An efficient, not-only-linear correlation coefficient based on clustering
Pividori, Milton Damián
correlation coefficient
nonlinear relationships
clustering
gene expression
title_short An efficient, not-only-linear correlation coefficient based on clustering
title_full An efficient, not-only-linear correlation coefficient based on clustering
title_fullStr An efficient, not-only-linear correlation coefficient based on clustering
title_full_unstemmed An efficient, not-only-linear correlation coefficient based on clustering
title_sort An efficient, not-only-linear correlation coefficient based on clustering
dc.creator.none.fl_str_mv Pividori, Milton Damián
Ritchie, Marylyn D.
Milone, Diego Humberto
Greene, Casey S.
author Pividori, Milton Damián
author_facet Pividori, Milton Damián
Ritchie, Marylyn D.
Milone, Diego Humberto
Greene, Casey S.
author_role author
author2 Ritchie, Marylyn D.
Milone, Diego Humberto
Greene, Casey S.
author2_role author
author
author
dc.subject.none.fl_str_mv correlation coefficient
nonlinear relationships
clustering
gene expression
topic correlation coefficient
nonlinear relationships
clustering
gene expression
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Identifying meaningful patterns in data is crucial for understanding complex biological processes, particularly in transcriptomics, where genes with correlated expression often share functions or contribute to disease mechanisms. Traditional correlation coefficients, which primarily capture linear relationships, may overlook important nonlinear patterns. We introduce the clustermatch correlation coefficient (CCC), a not-only-linear coefficient that utilizes clustering to efficiently detect both linear and nonlinear associations. CCC outperforms standard methods by revealing biologically meaningful patterns that linear-only coefficients miss and is faster than state-of-the-art coefficients such as the maximal information coefficient. When applied to human gene expression data from genotype-tissue expression (GTEx), CCC identified robust linear relationships and nonlinear patterns, such as sex-specific differences, that are undetectable by standard methods. Highly ranked gene pairs were enriched for interactions in integrated networks built from protein-protein interactions, transcription factor regulation, and chemical and genetic perturbations, suggesting that CCC can detect functional relationships missed by linear-only approaches. CCC is a highly efficient, next-generation, not-only-linear correlation coefficient for genome-scale data. A record of this paper’s transparent peer review process is included in the supplemental information.
Fil: Pividori, Milton Damián. University of Colorado; Estados Unidos. University of Pennsylvania; Estados Unidos. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Ritchie, Marylyn D.. University of Pennsylvania; Estados Unidos
Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Greene, Casey S.. University of Colorado; Estados Unidos
description Identifying meaningful patterns in data is crucial for understanding complex biological processes, particularly in transcriptomics, where genes with correlated expression often share functions or contribute to disease mechanisms. Traditional correlation coefficients, which primarily capture linear relationships, may overlook important nonlinear patterns. We introduce the clustermatch correlation coefficient (CCC), a not-only-linear coefficient that utilizes clustering to efficiently detect both linear and nonlinear associations. CCC outperforms standard methods by revealing biologically meaningful patterns that linear-only coefficients miss and is faster than state-of-the-art coefficients such as the maximal information coefficient. When applied to human gene expression data from genotype-tissue expression (GTEx), CCC identified robust linear relationships and nonlinear patterns, such as sex-specific differences, that are undetectable by standard methods. Highly ranked gene pairs were enriched for interactions in integrated networks built from protein-protein interactions, transcription factor regulation, and chemical and genetic perturbations, suggesting that CCC can detect functional relationships missed by linear-only approaches. CCC is a highly efficient, next-generation, not-only-linear correlation coefficient for genome-scale data. A record of this paper’s transparent peer review process is included in the supplemental information.
publishDate 2024
dc.date.none.fl_str_mv 2024-09
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/258282
Pividori, Milton Damián; Ritchie, Marylyn D.; Milone, Diego Humberto; Greene, Casey S.; An efficient, not-only-linear correlation coefficient based on clustering; Cell Press; Cell Systems; 15; 9; 9-2024; 854-868.e3
2405-4712
CONICET Digital
CONICET
url http://hdl.handle.net/11336/258282
identifier_str_mv Pividori, Milton Damián; Ritchie, Marylyn D.; Milone, Diego Humberto; Greene, Casey S.; An efficient, not-only-linear correlation coefficient based on clustering; Cell Press; Cell Systems; 15; 9; 9-2024; 854-868.e3
2405-4712
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://linkinghub.elsevier.com/retrieve/pii/S2405471224002357
info:eu-repo/semantics/altIdentifier/doi/10.1016/j.cels.2024.08.005
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Cell Press
publisher.none.fl_str_mv Cell Press
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844613044939784192
score 13.070432