Inferring a Property of a Large System from a Small Number of Samples

Autores
Hernández Lahme, Damián Gabriel; Samengo, Ines
Año de publicación
2022
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Inferring the value of a property of a large stochastic system is a difficult task when the number of samples is insufficient to reliably estimate the probability distribution. The Bayesian estimator of the property of interest requires the knowledge of the prior distribution, and in many situations, it is not clear which prior should be used. Several estimators have been developed so far in which the proposed prior us individually tailored for each property of interest; such is the case, for example, for the entropy, the amount of mutual information, or the correlation between pairs of variables. In this paper, we propose a general framework to select priors that is valid for arbitrary properties. We first demonstrate that only certain aspects of the prior distribution actually affect the inference process. We then expand the sought prior as a linear combination of a one-dimensional family of indexed priors, each of which is obtained through a maximum entropy approach with constrained mean values of the property under study. In many cases of interest, only one or very few components of the expansion turn out to contribute to the Bayesian estimator, so it is often valid to only keep a single component. The relevant component is selected by the data, so no handcrafted priors are required. We test the performance of this approximation with a few paradigmatic examples and show that it performs well in comparison to the ad-hoc methods previously proposed in the literature. Our method highlights the connection between Bayesian inference and equilibrium statistical mechanics, since the most relevant component of the expansion can be argued to be that with the right temperature.
Fil: Hernández Lahme, Damián Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Patagonia Norte; Argentina. Comisión Nacional de Energía Atómica. Centro Atómico Bariloche; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; Argentina
Fil: Samengo, Ines. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Patagonia Norte; Argentina. Comisión Nacional de Energía Atómica. Centro Atómico Bariloche; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; Argentina
Materia
BAYESIAN
ENTROPY
INFERENCE
MUTUAL INFORMATION
UNDERSAMPLED
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/188161

id CONICETDig_98d7f667dac859e63af53c464de1342b
oai_identifier_str oai:ri.conicet.gov.ar:11336/188161
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Inferring a Property of a Large System from a Small Number of SamplesHernández Lahme, Damián GabrielSamengo, InesBAYESIANENTROPYINFERENCEMUTUAL INFORMATIONUNDERSAMPLEDhttps://purl.org/becyt/ford/1.3https://purl.org/becyt/ford/1https://purl.org/becyt/ford/1.1https://purl.org/becyt/ford/1Inferring the value of a property of a large stochastic system is a difficult task when the number of samples is insufficient to reliably estimate the probability distribution. The Bayesian estimator of the property of interest requires the knowledge of the prior distribution, and in many situations, it is not clear which prior should be used. Several estimators have been developed so far in which the proposed prior us individually tailored for each property of interest; such is the case, for example, for the entropy, the amount of mutual information, or the correlation between pairs of variables. In this paper, we propose a general framework to select priors that is valid for arbitrary properties. We first demonstrate that only certain aspects of the prior distribution actually affect the inference process. We then expand the sought prior as a linear combination of a one-dimensional family of indexed priors, each of which is obtained through a maximum entropy approach with constrained mean values of the property under study. In many cases of interest, only one or very few components of the expansion turn out to contribute to the Bayesian estimator, so it is often valid to only keep a single component. The relevant component is selected by the data, so no handcrafted priors are required. We test the performance of this approximation with a few paradigmatic examples and show that it performs well in comparison to the ad-hoc methods previously proposed in the literature. Our method highlights the connection between Bayesian inference and equilibrium statistical mechanics, since the most relevant component of the expansion can be argued to be that with the right temperature.Fil: Hernández Lahme, Damián Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Patagonia Norte; Argentina. Comisión Nacional de Energía Atómica. Centro Atómico Bariloche; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; ArgentinaFil: Samengo, Ines. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Patagonia Norte; Argentina. Comisión Nacional de Energía Atómica. Centro Atómico Bariloche; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; ArgentinaMolecular Diversity Preservation International2022-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/188161Hernández Lahme, Damián Gabriel; Samengo, Ines; Inferring a Property of a Large System from a Small Number of Samples; Molecular Diversity Preservation International; Entropy; 24; 1; 1-2022; 1-171099-4300CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.mdpi.com/1099-4300/24/1/125/htminfo:eu-repo/semantics/altIdentifier/doi/10.3390/e24010125info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:20:55Zoai:ri.conicet.gov.ar:11336/188161instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:20:55.667CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Inferring a Property of a Large System from a Small Number of Samples
title Inferring a Property of a Large System from a Small Number of Samples
spellingShingle Inferring a Property of a Large System from a Small Number of Samples
Hernández Lahme, Damián Gabriel
BAYESIAN
ENTROPY
INFERENCE
MUTUAL INFORMATION
UNDERSAMPLED
title_short Inferring a Property of a Large System from a Small Number of Samples
title_full Inferring a Property of a Large System from a Small Number of Samples
title_fullStr Inferring a Property of a Large System from a Small Number of Samples
title_full_unstemmed Inferring a Property of a Large System from a Small Number of Samples
title_sort Inferring a Property of a Large System from a Small Number of Samples
dc.creator.none.fl_str_mv Hernández Lahme, Damián Gabriel
Samengo, Ines
author Hernández Lahme, Damián Gabriel
author_facet Hernández Lahme, Damián Gabriel
Samengo, Ines
author_role author
author2 Samengo, Ines
author2_role author
dc.subject.none.fl_str_mv BAYESIAN
ENTROPY
INFERENCE
MUTUAL INFORMATION
UNDERSAMPLED
topic BAYESIAN
ENTROPY
INFERENCE
MUTUAL INFORMATION
UNDERSAMPLED
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.3
https://purl.org/becyt/ford/1
https://purl.org/becyt/ford/1.1
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Inferring the value of a property of a large stochastic system is a difficult task when the number of samples is insufficient to reliably estimate the probability distribution. The Bayesian estimator of the property of interest requires the knowledge of the prior distribution, and in many situations, it is not clear which prior should be used. Several estimators have been developed so far in which the proposed prior us individually tailored for each property of interest; such is the case, for example, for the entropy, the amount of mutual information, or the correlation between pairs of variables. In this paper, we propose a general framework to select priors that is valid for arbitrary properties. We first demonstrate that only certain aspects of the prior distribution actually affect the inference process. We then expand the sought prior as a linear combination of a one-dimensional family of indexed priors, each of which is obtained through a maximum entropy approach with constrained mean values of the property under study. In many cases of interest, only one or very few components of the expansion turn out to contribute to the Bayesian estimator, so it is often valid to only keep a single component. The relevant component is selected by the data, so no handcrafted priors are required. We test the performance of this approximation with a few paradigmatic examples and show that it performs well in comparison to the ad-hoc methods previously proposed in the literature. Our method highlights the connection between Bayesian inference and equilibrium statistical mechanics, since the most relevant component of the expansion can be argued to be that with the right temperature.
Fil: Hernández Lahme, Damián Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Patagonia Norte; Argentina. Comisión Nacional de Energía Atómica. Centro Atómico Bariloche; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; Argentina
Fil: Samengo, Ines. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Patagonia Norte; Argentina. Comisión Nacional de Energía Atómica. Centro Atómico Bariloche; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; Argentina
description Inferring the value of a property of a large stochastic system is a difficult task when the number of samples is insufficient to reliably estimate the probability distribution. The Bayesian estimator of the property of interest requires the knowledge of the prior distribution, and in many situations, it is not clear which prior should be used. Several estimators have been developed so far in which the proposed prior us individually tailored for each property of interest; such is the case, for example, for the entropy, the amount of mutual information, or the correlation between pairs of variables. In this paper, we propose a general framework to select priors that is valid for arbitrary properties. We first demonstrate that only certain aspects of the prior distribution actually affect the inference process. We then expand the sought prior as a linear combination of a one-dimensional family of indexed priors, each of which is obtained through a maximum entropy approach with constrained mean values of the property under study. In many cases of interest, only one or very few components of the expansion turn out to contribute to the Bayesian estimator, so it is often valid to only keep a single component. The relevant component is selected by the data, so no handcrafted priors are required. We test the performance of this approximation with a few paradigmatic examples and show that it performs well in comparison to the ad-hoc methods previously proposed in the literature. Our method highlights the connection between Bayesian inference and equilibrium statistical mechanics, since the most relevant component of the expansion can be argued to be that with the right temperature.
publishDate 2022
dc.date.none.fl_str_mv 2022-01
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/188161
Hernández Lahme, Damián Gabriel; Samengo, Ines; Inferring a Property of a Large System from a Small Number of Samples; Molecular Diversity Preservation International; Entropy; 24; 1; 1-2022; 1-17
1099-4300
CONICET Digital
CONICET
url http://hdl.handle.net/11336/188161
identifier_str_mv Hernández Lahme, Damián Gabriel; Samengo, Ines; Inferring a Property of a Large System from a Small Number of Samples; Molecular Diversity Preservation International; Entropy; 24; 1; 1-2022; 1-17
1099-4300
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://www.mdpi.com/1099-4300/24/1/125/htm
info:eu-repo/semantics/altIdentifier/doi/10.3390/e24010125
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Molecular Diversity Preservation International
publisher.none.fl_str_mv Molecular Diversity Preservation International
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844614194374115328
score 13.070432