Information-theoretical analysis of the statistical dependencies among three variables: Applications to written language

Autores
Hernández Lahme, Damián Gabriel; Zanette, Damian Horacio; Samengo, Ines
Año de publicación
2015
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
We develop the information-theoretical concepts required to study the statistical dependencies among three variables. Some of such dependencies are pure triple interactions, in the sense that they cannot be explained in terms of a combination of pairwise correlations. We derive bounds for triple dependencies, and characterize the shape of the joint probability distribution of three binary variables with high triple interaction. The analysis also allows us to quantify the amount of redundancy in the mutual information between pairs of variables, and to assess whether the information between two variables is or is not mediated by a third variable. These concepts are applied to the analysis of written texts. We find that the probability that a given word is found in a particular location within the text is not only modulated by the presence or absence of other nearby words, but also, on the presence or absence of nearby pairs of words. We identify the words enclosing the key semantic concepts of the text, the triplets of words with high pairwise and triple interactions, and the words that mediate the pairwise interactions between other words.
Fil: Hernández Lahme, Damián Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Patagonia Norte; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Area de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; Argentina
Fil: Zanette, Damian Horacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Patagonia Norte; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Area de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; Argentina
Fil: Samengo, Ines. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Patagonia Norte; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Area de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; Argentina
Materia
Information Theory
Correlations
Language
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/57147

id CONICETDig_0ed913fdd320656e8c2e7d897291bd07
oai_identifier_str oai:ri.conicet.gov.ar:11336/57147
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Information-theoretical analysis of the statistical dependencies among three variables: Applications to written languageHernández Lahme, Damián GabrielZanette, Damian HoracioSamengo, InesInformation TheoryCorrelationsLanguagehttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1We develop the information-theoretical concepts required to study the statistical dependencies among three variables. Some of such dependencies are pure triple interactions, in the sense that they cannot be explained in terms of a combination of pairwise correlations. We derive bounds for triple dependencies, and characterize the shape of the joint probability distribution of three binary variables with high triple interaction. The analysis also allows us to quantify the amount of redundancy in the mutual information between pairs of variables, and to assess whether the information between two variables is or is not mediated by a third variable. These concepts are applied to the analysis of written texts. We find that the probability that a given word is found in a particular location within the text is not only modulated by the presence or absence of other nearby words, but also, on the presence or absence of nearby pairs of words. We identify the words enclosing the key semantic concepts of the text, the triplets of words with high pairwise and triple interactions, and the words that mediate the pairwise interactions between other words.Fil: Hernández Lahme, Damián Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Patagonia Norte; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Area de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; ArgentinaFil: Zanette, Damian Horacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Patagonia Norte; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Area de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; ArgentinaFil: Samengo, Ines. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Patagonia Norte; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Area de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; ArgentinaAmerican Physical Society2015-08info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/57147Hernández Lahme, Damián Gabriel; Zanette, Damian Horacio; Samengo, Ines; Information-theoretical analysis of the statistical dependencies among three variables: Applications to written language; American Physical Society; Physical Review E: Statistical, Nonlinear and Soft Matter Physics; 92; 2; 8-2015; 22813-228301063-651X1539-3755CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1103/PhysRevE.92.022813info:eu-repo/semantics/altIdentifier/url/https://journals.aps.org/pre/abstract/10.1103/PhysRevE.92.022813info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:29:22Zoai:ri.conicet.gov.ar:11336/57147instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:29:22.994CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Information-theoretical analysis of the statistical dependencies among three variables: Applications to written language
title Information-theoretical analysis of the statistical dependencies among three variables: Applications to written language
spellingShingle Information-theoretical analysis of the statistical dependencies among three variables: Applications to written language
Hernández Lahme, Damián Gabriel
Information Theory
Correlations
Language
title_short Information-theoretical analysis of the statistical dependencies among three variables: Applications to written language
title_full Information-theoretical analysis of the statistical dependencies among three variables: Applications to written language
title_fullStr Information-theoretical analysis of the statistical dependencies among three variables: Applications to written language
title_full_unstemmed Information-theoretical analysis of the statistical dependencies among three variables: Applications to written language
title_sort Information-theoretical analysis of the statistical dependencies among three variables: Applications to written language
dc.creator.none.fl_str_mv Hernández Lahme, Damián Gabriel
Zanette, Damian Horacio
Samengo, Ines
author Hernández Lahme, Damián Gabriel
author_facet Hernández Lahme, Damián Gabriel
Zanette, Damian Horacio
Samengo, Ines
author_role author
author2 Zanette, Damian Horacio
Samengo, Ines
author2_role author
author
dc.subject.none.fl_str_mv Information Theory
Correlations
Language
topic Information Theory
Correlations
Language
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv We develop the information-theoretical concepts required to study the statistical dependencies among three variables. Some of such dependencies are pure triple interactions, in the sense that they cannot be explained in terms of a combination of pairwise correlations. We derive bounds for triple dependencies, and characterize the shape of the joint probability distribution of three binary variables with high triple interaction. The analysis also allows us to quantify the amount of redundancy in the mutual information between pairs of variables, and to assess whether the information between two variables is or is not mediated by a third variable. These concepts are applied to the analysis of written texts. We find that the probability that a given word is found in a particular location within the text is not only modulated by the presence or absence of other nearby words, but also, on the presence or absence of nearby pairs of words. We identify the words enclosing the key semantic concepts of the text, the triplets of words with high pairwise and triple interactions, and the words that mediate the pairwise interactions between other words.
Fil: Hernández Lahme, Damián Gabriel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Patagonia Norte; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Area de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; Argentina
Fil: Zanette, Damian Horacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Patagonia Norte; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Area de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; Argentina
Fil: Samengo, Ines. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Patagonia Norte; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Area de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; Argentina
description We develop the information-theoretical concepts required to study the statistical dependencies among three variables. Some of such dependencies are pure triple interactions, in the sense that they cannot be explained in terms of a combination of pairwise correlations. We derive bounds for triple dependencies, and characterize the shape of the joint probability distribution of three binary variables with high triple interaction. The analysis also allows us to quantify the amount of redundancy in the mutual information between pairs of variables, and to assess whether the information between two variables is or is not mediated by a third variable. These concepts are applied to the analysis of written texts. We find that the probability that a given word is found in a particular location within the text is not only modulated by the presence or absence of other nearby words, but also, on the presence or absence of nearby pairs of words. We identify the words enclosing the key semantic concepts of the text, the triplets of words with high pairwise and triple interactions, and the words that mediate the pairwise interactions between other words.
publishDate 2015
dc.date.none.fl_str_mv 2015-08
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/57147
Hernández Lahme, Damián Gabriel; Zanette, Damian Horacio; Samengo, Ines; Information-theoretical analysis of the statistical dependencies among three variables: Applications to written language; American Physical Society; Physical Review E: Statistical, Nonlinear and Soft Matter Physics; 92; 2; 8-2015; 22813-22830
1063-651X
1539-3755
CONICET Digital
CONICET
url http://hdl.handle.net/11336/57147
identifier_str_mv Hernández Lahme, Damián Gabriel; Zanette, Damian Horacio; Samengo, Ines; Information-theoretical analysis of the statistical dependencies among three variables: Applications to written language; American Physical Society; Physical Review E: Statistical, Nonlinear and Soft Matter Physics; 92; 2; 8-2015; 22813-22830
1063-651X
1539-3755
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/10.1103/PhysRevE.92.022813
info:eu-repo/semantics/altIdentifier/url/https://journals.aps.org/pre/abstract/10.1103/PhysRevE.92.022813
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv American Physical Society
publisher.none.fl_str_mv American Physical Society
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844614299910144000
score 13.070432