Information Approach to Co-occurrence of Words in Written Language
- Autores
- Hernández Lahme, Damián Gabriel
- Año de publicación
- 2015
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- In this paper we study the distribution of words across the different parts of a book using tools from information theory. In particular, the mutual information between words in the text and parts of the text is compared with the mutual information of a shuffled version of the book. This analysis allows us to extract not only relevant words of the text but also relationships between the different words, such as co-occurrence and repulsion between them. With the connections due to co-occurrence of words, we show how to construct a network that reflects the semantic organization of the book. This method can be applied to other types of sequences, measuring the relations between the different symbols that compose such sequences.
Fil: Hernández Lahme, Damián Gabriel. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; Argentina. Comisión Nacional de Energía Atómica. Centro Atómico Bariloche; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina - Materia
-
INFORMATION
COOCURRENCE
WORDS
LANGUAGE - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/54970
Ver los metadatos del registro completo
id |
CONICETDig_99a52fd102cff725e829c7d75b77501c |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/54970 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Information Approach to Co-occurrence of Words in Written LanguageHernández Lahme, Damián GabrielINFORMATIONCOOCURRENCEWORDSLANGUAGEhttps://purl.org/becyt/ford/1.3https://purl.org/becyt/ford/1In this paper we study the distribution of words across the different parts of a book using tools from information theory. In particular, the mutual information between words in the text and parts of the text is compared with the mutual information of a shuffled version of the book. This analysis allows us to extract not only relevant words of the text but also relationships between the different words, such as co-occurrence and repulsion between them. With the connections due to co-occurrence of words, we show how to construct a network that reflects the semantic organization of the book. This method can be applied to other types of sequences, measuring the relations between the different symbols that compose such sequences.Fil: Hernández Lahme, Damián Gabriel. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; Argentina. Comisión Nacional de Energía Atómica. Centro Atómico Bariloche; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaComplex Systems Publications2015-06info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/54970Hernández Lahme, Damián Gabriel; Information Approach to Co-occurrence of Words in Written Language; Complex Systems Publications; Complex systems; 24; 2; 6-2015; 1-210891-2513CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/http://www.complex-systems.com/abstracts/v24_i02_a03/info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:39:57Zoai:ri.conicet.gov.ar:11336/54970instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:39:57.553CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Information Approach to Co-occurrence of Words in Written Language |
title |
Information Approach to Co-occurrence of Words in Written Language |
spellingShingle |
Information Approach to Co-occurrence of Words in Written Language Hernández Lahme, Damián Gabriel INFORMATION COOCURRENCE WORDS LANGUAGE |
title_short |
Information Approach to Co-occurrence of Words in Written Language |
title_full |
Information Approach to Co-occurrence of Words in Written Language |
title_fullStr |
Information Approach to Co-occurrence of Words in Written Language |
title_full_unstemmed |
Information Approach to Co-occurrence of Words in Written Language |
title_sort |
Information Approach to Co-occurrence of Words in Written Language |
dc.creator.none.fl_str_mv |
Hernández Lahme, Damián Gabriel |
author |
Hernández Lahme, Damián Gabriel |
author_facet |
Hernández Lahme, Damián Gabriel |
author_role |
author |
dc.subject.none.fl_str_mv |
INFORMATION COOCURRENCE WORDS LANGUAGE |
topic |
INFORMATION COOCURRENCE WORDS LANGUAGE |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.3 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
In this paper we study the distribution of words across the different parts of a book using tools from information theory. In particular, the mutual information between words in the text and parts of the text is compared with the mutual information of a shuffled version of the book. This analysis allows us to extract not only relevant words of the text but also relationships between the different words, such as co-occurrence and repulsion between them. With the connections due to co-occurrence of words, we show how to construct a network that reflects the semantic organization of the book. This method can be applied to other types of sequences, measuring the relations between the different symbols that compose such sequences. Fil: Hernández Lahme, Damián Gabriel. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; Argentina. Comisión Nacional de Energía Atómica. Centro Atómico Bariloche; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina |
description |
In this paper we study the distribution of words across the different parts of a book using tools from information theory. In particular, the mutual information between words in the text and parts of the text is compared with the mutual information of a shuffled version of the book. This analysis allows us to extract not only relevant words of the text but also relationships between the different words, such as co-occurrence and repulsion between them. With the connections due to co-occurrence of words, we show how to construct a network that reflects the semantic organization of the book. This method can be applied to other types of sequences, measuring the relations between the different symbols that compose such sequences. |
publishDate |
2015 |
dc.date.none.fl_str_mv |
2015-06 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/54970 Hernández Lahme, Damián Gabriel; Information Approach to Co-occurrence of Words in Written Language; Complex Systems Publications; Complex systems; 24; 2; 6-2015; 1-21 0891-2513 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/54970 |
identifier_str_mv |
Hernández Lahme, Damián Gabriel; Information Approach to Co-occurrence of Words in Written Language; Complex Systems Publications; Complex systems; 24; 2; 6-2015; 1-21 0891-2513 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/http://www.complex-systems.com/abstracts/v24_i02_a03/ |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Complex Systems Publications |
publisher.none.fl_str_mv |
Complex Systems Publications |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844613263539568640 |
score |
13.070432 |