An efficient, not-only-linear correlation coefficient based on clustering
- Autores
- Pividori, Milton Damián; Ritchie, Marylyn D.; Milone, Diego Humberto; Greene, Casey S.
- Año de publicación
- 2024
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Identifying meaningful patterns in data is crucial for understanding complex biological processes, particularly in transcriptomics, where genes with correlated expression often share functions or contribute to disease mechanisms. Traditional correlation coefficients, which primarily capture linear relationships, may overlook important nonlinear patterns. We introduce the clustermatch correlation coefficient (CCC), a not-only-linear coefficient that utilizes clustering to efficiently detect both linear and nonlinear associations. CCC outperforms standard methods by revealing biologically meaningful patterns that linear-only coefficients miss and is faster than state-of-the-art coefficients such as the maximal information coefficient. When applied to human gene expression data from genotype-tissue expression (GTEx), CCC identified robust linear relationships and nonlinear patterns, such as sex-specific differences, that are undetectable by standard methods. Highly ranked gene pairs were enriched for interactions in integrated networks built from protein-protein interactions, transcription factor regulation, and chemical and genetic perturbations, suggesting that CCC can detect functional relationships missed by linear-only approaches. CCC is a highly efficient, next-generation, not-only-linear correlation coefficient for genome-scale data. A record of this paper’s transparent peer review process is included in the supplemental information.
Fil: Pividori, Milton Damián. University of Colorado; Estados Unidos. University of Pennsylvania; Estados Unidos. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Ritchie, Marylyn D.. University of Pennsylvania; Estados Unidos
Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina
Fil: Greene, Casey S.. University of Colorado; Estados Unidos - Materia
-
correlation coefficient
nonlinear relationships
clustering
gene expression - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/258282
Ver los metadatos del registro completo
id |
CONICETDig_9f5bdd9eb2e708c944566aa1ccbb5c7a |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/258282 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
An efficient, not-only-linear correlation coefficient based on clusteringPividori, Milton DamiánRitchie, Marylyn D.Milone, Diego HumbertoGreene, Casey S.correlation coefficientnonlinear relationshipsclusteringgene expressionhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Identifying meaningful patterns in data is crucial for understanding complex biological processes, particularly in transcriptomics, where genes with correlated expression often share functions or contribute to disease mechanisms. Traditional correlation coefficients, which primarily capture linear relationships, may overlook important nonlinear patterns. We introduce the clustermatch correlation coefficient (CCC), a not-only-linear coefficient that utilizes clustering to efficiently detect both linear and nonlinear associations. CCC outperforms standard methods by revealing biologically meaningful patterns that linear-only coefficients miss and is faster than state-of-the-art coefficients such as the maximal information coefficient. When applied to human gene expression data from genotype-tissue expression (GTEx), CCC identified robust linear relationships and nonlinear patterns, such as sex-specific differences, that are undetectable by standard methods. Highly ranked gene pairs were enriched for interactions in integrated networks built from protein-protein interactions, transcription factor regulation, and chemical and genetic perturbations, suggesting that CCC can detect functional relationships missed by linear-only approaches. CCC is a highly efficient, next-generation, not-only-linear correlation coefficient for genome-scale data. A record of this paper’s transparent peer review process is included in the supplemental information.Fil: Pividori, Milton Damián. University of Colorado; Estados Unidos. University of Pennsylvania; Estados Unidos. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Ritchie, Marylyn D.. University of Pennsylvania; Estados UnidosFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Greene, Casey S.. University of Colorado; Estados UnidosCell Press2024-09info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/258282Pividori, Milton Damián; Ritchie, Marylyn D.; Milone, Diego Humberto; Greene, Casey S.; An efficient, not-only-linear correlation coefficient based on clustering; Cell Press; Cell Systems; 15; 9; 9-2024; 854-868.e32405-4712CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://linkinghub.elsevier.com/retrieve/pii/S2405471224002357info:eu-repo/semantics/altIdentifier/doi/10.1016/j.cels.2024.08.005info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:33:52Zoai:ri.conicet.gov.ar:11336/258282instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:33:53.032CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
An efficient, not-only-linear correlation coefficient based on clustering |
title |
An efficient, not-only-linear correlation coefficient based on clustering |
spellingShingle |
An efficient, not-only-linear correlation coefficient based on clustering Pividori, Milton Damián correlation coefficient nonlinear relationships clustering gene expression |
title_short |
An efficient, not-only-linear correlation coefficient based on clustering |
title_full |
An efficient, not-only-linear correlation coefficient based on clustering |
title_fullStr |
An efficient, not-only-linear correlation coefficient based on clustering |
title_full_unstemmed |
An efficient, not-only-linear correlation coefficient based on clustering |
title_sort |
An efficient, not-only-linear correlation coefficient based on clustering |
dc.creator.none.fl_str_mv |
Pividori, Milton Damián Ritchie, Marylyn D. Milone, Diego Humberto Greene, Casey S. |
author |
Pividori, Milton Damián |
author_facet |
Pividori, Milton Damián Ritchie, Marylyn D. Milone, Diego Humberto Greene, Casey S. |
author_role |
author |
author2 |
Ritchie, Marylyn D. Milone, Diego Humberto Greene, Casey S. |
author2_role |
author author author |
dc.subject.none.fl_str_mv |
correlation coefficient nonlinear relationships clustering gene expression |
topic |
correlation coefficient nonlinear relationships clustering gene expression |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
Identifying meaningful patterns in data is crucial for understanding complex biological processes, particularly in transcriptomics, where genes with correlated expression often share functions or contribute to disease mechanisms. Traditional correlation coefficients, which primarily capture linear relationships, may overlook important nonlinear patterns. We introduce the clustermatch correlation coefficient (CCC), a not-only-linear coefficient that utilizes clustering to efficiently detect both linear and nonlinear associations. CCC outperforms standard methods by revealing biologically meaningful patterns that linear-only coefficients miss and is faster than state-of-the-art coefficients such as the maximal information coefficient. When applied to human gene expression data from genotype-tissue expression (GTEx), CCC identified robust linear relationships and nonlinear patterns, such as sex-specific differences, that are undetectable by standard methods. Highly ranked gene pairs were enriched for interactions in integrated networks built from protein-protein interactions, transcription factor regulation, and chemical and genetic perturbations, suggesting that CCC can detect functional relationships missed by linear-only approaches. CCC is a highly efficient, next-generation, not-only-linear correlation coefficient for genome-scale data. A record of this paper’s transparent peer review process is included in the supplemental information. Fil: Pividori, Milton Damián. University of Colorado; Estados Unidos. University of Pennsylvania; Estados Unidos. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina Fil: Ritchie, Marylyn D.. University of Pennsylvania; Estados Unidos Fil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina Fil: Greene, Casey S.. University of Colorado; Estados Unidos |
description |
Identifying meaningful patterns in data is crucial for understanding complex biological processes, particularly in transcriptomics, where genes with correlated expression often share functions or contribute to disease mechanisms. Traditional correlation coefficients, which primarily capture linear relationships, may overlook important nonlinear patterns. We introduce the clustermatch correlation coefficient (CCC), a not-only-linear coefficient that utilizes clustering to efficiently detect both linear and nonlinear associations. CCC outperforms standard methods by revealing biologically meaningful patterns that linear-only coefficients miss and is faster than state-of-the-art coefficients such as the maximal information coefficient. When applied to human gene expression data from genotype-tissue expression (GTEx), CCC identified robust linear relationships and nonlinear patterns, such as sex-specific differences, that are undetectable by standard methods. Highly ranked gene pairs were enriched for interactions in integrated networks built from protein-protein interactions, transcription factor regulation, and chemical and genetic perturbations, suggesting that CCC can detect functional relationships missed by linear-only approaches. CCC is a highly efficient, next-generation, not-only-linear correlation coefficient for genome-scale data. A record of this paper’s transparent peer review process is included in the supplemental information. |
publishDate |
2024 |
dc.date.none.fl_str_mv |
2024-09 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/258282 Pividori, Milton Damián; Ritchie, Marylyn D.; Milone, Diego Humberto; Greene, Casey S.; An efficient, not-only-linear correlation coefficient based on clustering; Cell Press; Cell Systems; 15; 9; 9-2024; 854-868.e3 2405-4712 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/258282 |
identifier_str_mv |
Pividori, Milton Damián; Ritchie, Marylyn D.; Milone, Diego Humberto; Greene, Casey S.; An efficient, not-only-linear correlation coefficient based on clustering; Cell Press; Cell Systems; 15; 9; 9-2024; 854-868.e3 2405-4712 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://linkinghub.elsevier.com/retrieve/pii/S2405471224002357 info:eu-repo/semantics/altIdentifier/doi/10.1016/j.cels.2024.08.005 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Cell Press |
publisher.none.fl_str_mv |
Cell Press |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844613044939784192 |
score |
13.070432 |