Estimating the mutual information between two discrete, asymmetric variables with limited samples

Autores
Hernández Lahme, Damián Gabriel; Samengo, Ines
Año de publicación
2019
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Determining the strength of nonlinear, statistical dependencies between two variables is a crucial matter in many research fields. The established measure for quantifying such relations is the mutual information. However, estimating mutual information from limited samples is a challenging task. Since the mutual information is the difference of two entropies, the existing Bayesian estimators of entropy may be used to estimate information. This procedure, however, is still biased in the severely under-sampled regime. Here, we propose an alternative estimator that is applicable to those cases in which the marginal distribution of one of the two variables?the one with minimal entropy?is well sampled. The other variable, as well as the joint and conditional distributions, can be severely undersampled. We obtain a consistent estimator that presents very low bias, outperforming previous methods even when the sampled data contain few coincidences. As with other Bayesian estimators, our proposal focuses on the strength of the interaction between the two variables, without seeking to model the specific way in which they are related. A distinctive property of our method is that the main data statistics determining the amount of mutual information is the inhomogeneity of the conditional distribution of the low-entropy variable in those states in which the large-entropy variable registers coincidences.
Fil: Hernández Lahme, Damián Gabriel. Comisión Nacional de Energía Atómica. Gerencia del Área de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Samengo, Ines. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); Argentina
Materia
BAYESIAN ESTIMATION
MUTUAL INFORMATION
BIAS
SAMPLING
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/121475

id CONICETDig_14044211b14cdb1d3eb8a4b4aa4f2f6c
oai_identifier_str oai:ri.conicet.gov.ar:11336/121475
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Estimating the mutual information between two discrete, asymmetric variables with limited samplesHernández Lahme, Damián GabrielSamengo, InesBAYESIAN ESTIMATIONMUTUAL INFORMATIONBIASSAMPLINGhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Determining the strength of nonlinear, statistical dependencies between two variables is a crucial matter in many research fields. The established measure for quantifying such relations is the mutual information. However, estimating mutual information from limited samples is a challenging task. Since the mutual information is the difference of two entropies, the existing Bayesian estimators of entropy may be used to estimate information. This procedure, however, is still biased in the severely under-sampled regime. Here, we propose an alternative estimator that is applicable to those cases in which the marginal distribution of one of the two variables?the one with minimal entropy?is well sampled. The other variable, as well as the joint and conditional distributions, can be severely undersampled. We obtain a consistent estimator that presents very low bias, outperforming previous methods even when the sampled data contain few coincidences. As with other Bayesian estimators, our proposal focuses on the strength of the interaction between the two variables, without seeking to model the specific way in which they are related. A distinctive property of our method is that the main data statistics determining the amount of mutual information is the inhomogeneity of the conditional distribution of the low-entropy variable in those states in which the large-entropy variable registers coincidences.Fil: Hernández Lahme, Damián Gabriel. Comisión Nacional de Energía Atómica. Gerencia del Área de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Samengo, Ines. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); ArgentinaMolecular Diversity Preservation International2019-06info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/121475Hernández Lahme, Damián Gabriel; Samengo, Ines; Estimating the mutual information between two discrete, asymmetric variables with limited samples; Molecular Diversity Preservation International; Entropy; 21; 6; 6-2019; 1-201099-4300CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.mdpi.com/1099-4300/21/6/623info:eu-repo/semantics/altIdentifier/doi/10.3390/e21060623info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:33:29Zoai:ri.conicet.gov.ar:11336/121475instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:33:30.089CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Estimating the mutual information between two discrete, asymmetric variables with limited samples
title Estimating the mutual information between two discrete, asymmetric variables with limited samples
spellingShingle Estimating the mutual information between two discrete, asymmetric variables with limited samples
Hernández Lahme, Damián Gabriel
BAYESIAN ESTIMATION
MUTUAL INFORMATION
BIAS
SAMPLING
title_short Estimating the mutual information between two discrete, asymmetric variables with limited samples
title_full Estimating the mutual information between two discrete, asymmetric variables with limited samples
title_fullStr Estimating the mutual information between two discrete, asymmetric variables with limited samples
title_full_unstemmed Estimating the mutual information between two discrete, asymmetric variables with limited samples
title_sort Estimating the mutual information between two discrete, asymmetric variables with limited samples
dc.creator.none.fl_str_mv Hernández Lahme, Damián Gabriel
Samengo, Ines
author Hernández Lahme, Damián Gabriel
author_facet Hernández Lahme, Damián Gabriel
Samengo, Ines
author_role author
author2 Samengo, Ines
author2_role author
dc.subject.none.fl_str_mv BAYESIAN ESTIMATION
MUTUAL INFORMATION
BIAS
SAMPLING
topic BAYESIAN ESTIMATION
MUTUAL INFORMATION
BIAS
SAMPLING
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Determining the strength of nonlinear, statistical dependencies between two variables is a crucial matter in many research fields. The established measure for quantifying such relations is the mutual information. However, estimating mutual information from limited samples is a challenging task. Since the mutual information is the difference of two entropies, the existing Bayesian estimators of entropy may be used to estimate information. This procedure, however, is still biased in the severely under-sampled regime. Here, we propose an alternative estimator that is applicable to those cases in which the marginal distribution of one of the two variables?the one with minimal entropy?is well sampled. The other variable, as well as the joint and conditional distributions, can be severely undersampled. We obtain a consistent estimator that presents very low bias, outperforming previous methods even when the sampled data contain few coincidences. As with other Bayesian estimators, our proposal focuses on the strength of the interaction between the two variables, without seeking to model the specific way in which they are related. A distinctive property of our method is that the main data statistics determining the amount of mutual information is the inhomogeneity of the conditional distribution of the low-entropy variable in those states in which the large-entropy variable registers coincidences.
Fil: Hernández Lahme, Damián Gabriel. Comisión Nacional de Energía Atómica. Gerencia del Área de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Samengo, Ines. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); Argentina
description Determining the strength of nonlinear, statistical dependencies between two variables is a crucial matter in many research fields. The established measure for quantifying such relations is the mutual information. However, estimating mutual information from limited samples is a challenging task. Since the mutual information is the difference of two entropies, the existing Bayesian estimators of entropy may be used to estimate information. This procedure, however, is still biased in the severely under-sampled regime. Here, we propose an alternative estimator that is applicable to those cases in which the marginal distribution of one of the two variables?the one with minimal entropy?is well sampled. The other variable, as well as the joint and conditional distributions, can be severely undersampled. We obtain a consistent estimator that presents very low bias, outperforming previous methods even when the sampled data contain few coincidences. As with other Bayesian estimators, our proposal focuses on the strength of the interaction between the two variables, without seeking to model the specific way in which they are related. A distinctive property of our method is that the main data statistics determining the amount of mutual information is the inhomogeneity of the conditional distribution of the low-entropy variable in those states in which the large-entropy variable registers coincidences.
publishDate 2019
dc.date.none.fl_str_mv 2019-06
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/121475
Hernández Lahme, Damián Gabriel; Samengo, Ines; Estimating the mutual information between two discrete, asymmetric variables with limited samples; Molecular Diversity Preservation International; Entropy; 21; 6; 6-2019; 1-20
1099-4300
CONICET Digital
CONICET
url http://hdl.handle.net/11336/121475
identifier_str_mv Hernández Lahme, Damián Gabriel; Samengo, Ines; Estimating the mutual information between two discrete, asymmetric variables with limited samples; Molecular Diversity Preservation International; Entropy; 21; 6; 6-2019; 1-20
1099-4300
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://www.mdpi.com/1099-4300/21/6/623
info:eu-repo/semantics/altIdentifier/doi/10.3390/e21060623
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv Molecular Diversity Preservation International
publisher.none.fl_str_mv Molecular Diversity Preservation International
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844613029501599744
score 13.070432