Estimating the mutual information between two discrete, asymmetric variables with limited samples
- Autores
- Hernández Lahme, Damián Gabriel; Samengo, Ines
- Año de publicación
- 2019
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Determining the strength of nonlinear, statistical dependencies between two variables is a crucial matter in many research fields. The established measure for quantifying such relations is the mutual information. However, estimating mutual information from limited samples is a challenging task. Since the mutual information is the difference of two entropies, the existing Bayesian estimators of entropy may be used to estimate information. This procedure, however, is still biased in the severely under-sampled regime. Here, we propose an alternative estimator that is applicable to those cases in which the marginal distribution of one of the two variables?the one with minimal entropy?is well sampled. The other variable, as well as the joint and conditional distributions, can be severely undersampled. We obtain a consistent estimator that presents very low bias, outperforming previous methods even when the sampled data contain few coincidences. As with other Bayesian estimators, our proposal focuses on the strength of the interaction between the two variables, without seeking to model the specific way in which they are related. A distinctive property of our method is that the main data statistics determining the amount of mutual information is the inhomogeneity of the conditional distribution of the low-entropy variable in those states in which the large-entropy variable registers coincidences.
Fil: Hernández Lahme, Damián Gabriel. Comisión Nacional de Energía Atómica. Gerencia del Área de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Samengo, Ines. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); Argentina - Materia
-
BAYESIAN ESTIMATION
MUTUAL INFORMATION
BIAS
SAMPLING - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/121475
Ver los metadatos del registro completo
id |
CONICETDig_14044211b14cdb1d3eb8a4b4aa4f2f6c |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/121475 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Estimating the mutual information between two discrete, asymmetric variables with limited samplesHernández Lahme, Damián GabrielSamengo, InesBAYESIAN ESTIMATIONMUTUAL INFORMATIONBIASSAMPLINGhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Determining the strength of nonlinear, statistical dependencies between two variables is a crucial matter in many research fields. The established measure for quantifying such relations is the mutual information. However, estimating mutual information from limited samples is a challenging task. Since the mutual information is the difference of two entropies, the existing Bayesian estimators of entropy may be used to estimate information. This procedure, however, is still biased in the severely under-sampled regime. Here, we propose an alternative estimator that is applicable to those cases in which the marginal distribution of one of the two variables?the one with minimal entropy?is well sampled. The other variable, as well as the joint and conditional distributions, can be severely undersampled. We obtain a consistent estimator that presents very low bias, outperforming previous methods even when the sampled data contain few coincidences. As with other Bayesian estimators, our proposal focuses on the strength of the interaction between the two variables, without seeking to model the specific way in which they are related. A distinctive property of our method is that the main data statistics determining the amount of mutual information is the inhomogeneity of the conditional distribution of the low-entropy variable in those states in which the large-entropy variable registers coincidences.Fil: Hernández Lahme, Damián Gabriel. Comisión Nacional de Energía Atómica. Gerencia del Área de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Samengo, Ines. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); ArgentinaMolecular Diversity Preservation International2019-06info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/121475Hernández Lahme, Damián Gabriel; Samengo, Ines; Estimating the mutual information between two discrete, asymmetric variables with limited samples; Molecular Diversity Preservation International; Entropy; 21; 6; 6-2019; 1-201099-4300CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.mdpi.com/1099-4300/21/6/623info:eu-repo/semantics/altIdentifier/doi/10.3390/e21060623info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:33:29Zoai:ri.conicet.gov.ar:11336/121475instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:33:30.089CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Estimating the mutual information between two discrete, asymmetric variables with limited samples |
title |
Estimating the mutual information between two discrete, asymmetric variables with limited samples |
spellingShingle |
Estimating the mutual information between two discrete, asymmetric variables with limited samples Hernández Lahme, Damián Gabriel BAYESIAN ESTIMATION MUTUAL INFORMATION BIAS SAMPLING |
title_short |
Estimating the mutual information between two discrete, asymmetric variables with limited samples |
title_full |
Estimating the mutual information between two discrete, asymmetric variables with limited samples |
title_fullStr |
Estimating the mutual information between two discrete, asymmetric variables with limited samples |
title_full_unstemmed |
Estimating the mutual information between two discrete, asymmetric variables with limited samples |
title_sort |
Estimating the mutual information between two discrete, asymmetric variables with limited samples |
dc.creator.none.fl_str_mv |
Hernández Lahme, Damián Gabriel Samengo, Ines |
author |
Hernández Lahme, Damián Gabriel |
author_facet |
Hernández Lahme, Damián Gabriel Samengo, Ines |
author_role |
author |
author2 |
Samengo, Ines |
author2_role |
author |
dc.subject.none.fl_str_mv |
BAYESIAN ESTIMATION MUTUAL INFORMATION BIAS SAMPLING |
topic |
BAYESIAN ESTIMATION MUTUAL INFORMATION BIAS SAMPLING |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
Determining the strength of nonlinear, statistical dependencies between two variables is a crucial matter in many research fields. The established measure for quantifying such relations is the mutual information. However, estimating mutual information from limited samples is a challenging task. Since the mutual information is the difference of two entropies, the existing Bayesian estimators of entropy may be used to estimate information. This procedure, however, is still biased in the severely under-sampled regime. Here, we propose an alternative estimator that is applicable to those cases in which the marginal distribution of one of the two variables?the one with minimal entropy?is well sampled. The other variable, as well as the joint and conditional distributions, can be severely undersampled. We obtain a consistent estimator that presents very low bias, outperforming previous methods even when the sampled data contain few coincidences. As with other Bayesian estimators, our proposal focuses on the strength of the interaction between the two variables, without seeking to model the specific way in which they are related. A distinctive property of our method is that the main data statistics determining the amount of mutual information is the inhomogeneity of the conditional distribution of the low-entropy variable in those states in which the large-entropy variable registers coincidences. Fil: Hernández Lahme, Damián Gabriel. Comisión Nacional de Energía Atómica. Gerencia del Área de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina Fil: Samengo, Ines. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); Argentina |
description |
Determining the strength of nonlinear, statistical dependencies between two variables is a crucial matter in many research fields. The established measure for quantifying such relations is the mutual information. However, estimating mutual information from limited samples is a challenging task. Since the mutual information is the difference of two entropies, the existing Bayesian estimators of entropy may be used to estimate information. This procedure, however, is still biased in the severely under-sampled regime. Here, we propose an alternative estimator that is applicable to those cases in which the marginal distribution of one of the two variables?the one with minimal entropy?is well sampled. The other variable, as well as the joint and conditional distributions, can be severely undersampled. We obtain a consistent estimator that presents very low bias, outperforming previous methods even when the sampled data contain few coincidences. As with other Bayesian estimators, our proposal focuses on the strength of the interaction between the two variables, without seeking to model the specific way in which they are related. A distinctive property of our method is that the main data statistics determining the amount of mutual information is the inhomogeneity of the conditional distribution of the low-entropy variable in those states in which the large-entropy variable registers coincidences. |
publishDate |
2019 |
dc.date.none.fl_str_mv |
2019-06 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/121475 Hernández Lahme, Damián Gabriel; Samengo, Ines; Estimating the mutual information between two discrete, asymmetric variables with limited samples; Molecular Diversity Preservation International; Entropy; 21; 6; 6-2019; 1-20 1099-4300 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/121475 |
identifier_str_mv |
Hernández Lahme, Damián Gabriel; Samengo, Ines; Estimating the mutual information between two discrete, asymmetric variables with limited samples; Molecular Diversity Preservation International; Entropy; 21; 6; 6-2019; 1-20 1099-4300 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://www.mdpi.com/1099-4300/21/6/623 info:eu-repo/semantics/altIdentifier/doi/10.3390/e21060623 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Molecular Diversity Preservation International |
publisher.none.fl_str_mv |
Molecular Diversity Preservation International |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844613029501599744 |
score |
13.070432 |