Coherent oscillations in word-use data from 1700 to 2008

Autores
Montemurro, Marcelo Alejandro; Zanette, Damian Horacio
Año de publicación
2016
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
In written language, the choice of specific words is constrained by both grammatical requirements and the specific semantic context of the message to be transmitted. To a significant degree, the semantic context is in turn affected by a broad cultural and historical environment, which also influences matters of style and manners. Over time, those environmental factors leave an imprint in the statistics of language use, with some words becoming more common and other words being preferred less. Here we characterize the patterns of language use over time based on word statistics extracted from more than 4.5 million books written over a period of 308 years. We find evidence of novel systematic oscillatory patterns in word use with a consistent period narrowly distributed around 14 years. The specific phase relationships between different words show structure at two independent levels: first, there is a weak global phase modulation that is primarily linked to overall shifts in the vocabulary across time; and second, a stronger component dependent on well defined semantic relationships between words. In particular, complex network analysis reveals that semantically related words show strong phase coherence. Ultimately, these previously unknown patterns in the statistics of language may be a consequence of changes in the cultural framework that influences the thematic focus of writers.
Fil: Montemurro, Marcelo Alejandro. University of Manchester; Reino Unido
Fil: Zanette, Damian Horacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Patagonia Norte; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Area de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; Argentina
Materia
LANGUAGE STATISTICS
WORD USE
GOOGLE NGRAMMS
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/74505

id CONICETDig_db18dbac7e24a17a2c18857effd20d9a
oai_identifier_str oai:ri.conicet.gov.ar:11336/74505
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Coherent oscillations in word-use data from 1700 to 2008Montemurro, Marcelo AlejandroZanette, Damian HoracioLANGUAGE STATISTICSWORD USEGOOGLE NGRAMMShttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1In written language, the choice of specific words is constrained by both grammatical requirements and the specific semantic context of the message to be transmitted. To a significant degree, the semantic context is in turn affected by a broad cultural and historical environment, which also influences matters of style and manners. Over time, those environmental factors leave an imprint in the statistics of language use, with some words becoming more common and other words being preferred less. Here we characterize the patterns of language use over time based on word statistics extracted from more than 4.5 million books written over a period of 308 years. We find evidence of novel systematic oscillatory patterns in word use with a consistent period narrowly distributed around 14 years. The specific phase relationships between different words show structure at two independent levels: first, there is a weak global phase modulation that is primarily linked to overall shifts in the vocabulary across time; and second, a stronger component dependent on well defined semantic relationships between words. In particular, complex network analysis reveals that semantically related words show strong phase coherence. Ultimately, these previously unknown patterns in the statistics of language may be a consequence of changes in the cultural framework that influences the thematic focus of writers.Fil: Montemurro, Marcelo Alejandro. University of Manchester; Reino UnidoFil: Zanette, Damian Horacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Patagonia Norte; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Area de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; ArgentinaPalgrave Macmillan Ltd2016-12-05info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/74505Montemurro, Marcelo Alejandro; Zanette, Damian Horacio; Coherent oscillations in word-use data from 1700 to 2008; Palgrave Macmillan Ltd; Palgrave Communications; 2; 5-12-2016; 1-92055-1045CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.nature.com/articles/palcomms201684info:eu-repo/semantics/altIdentifier/doi/10.1057/palcomms.2016.84info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:39:22Zoai:ri.conicet.gov.ar:11336/74505instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:39:23.145CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Coherent oscillations in word-use data from 1700 to 2008
title Coherent oscillations in word-use data from 1700 to 2008
spellingShingle Coherent oscillations in word-use data from 1700 to 2008
Montemurro, Marcelo Alejandro
LANGUAGE STATISTICS
WORD USE
GOOGLE NGRAMMS
title_short Coherent oscillations in word-use data from 1700 to 2008
title_full Coherent oscillations in word-use data from 1700 to 2008
title_fullStr Coherent oscillations in word-use data from 1700 to 2008
title_full_unstemmed Coherent oscillations in word-use data from 1700 to 2008
title_sort Coherent oscillations in word-use data from 1700 to 2008
dc.creator.none.fl_str_mv Montemurro, Marcelo Alejandro
Zanette, Damian Horacio
author Montemurro, Marcelo Alejandro
author_facet Montemurro, Marcelo Alejandro
Zanette, Damian Horacio
author_role author
author2 Zanette, Damian Horacio
author2_role author
dc.subject.none.fl_str_mv LANGUAGE STATISTICS
WORD USE
GOOGLE NGRAMMS
topic LANGUAGE STATISTICS
WORD USE
GOOGLE NGRAMMS
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv In written language, the choice of specific words is constrained by both grammatical requirements and the specific semantic context of the message to be transmitted. To a significant degree, the semantic context is in turn affected by a broad cultural and historical environment, which also influences matters of style and manners. Over time, those environmental factors leave an imprint in the statistics of language use, with some words becoming more common and other words being preferred less. Here we characterize the patterns of language use over time based on word statistics extracted from more than 4.5 million books written over a period of 308 years. We find evidence of novel systematic oscillatory patterns in word use with a consistent period narrowly distributed around 14 years. The specific phase relationships between different words show structure at two independent levels: first, there is a weak global phase modulation that is primarily linked to overall shifts in the vocabulary across time; and second, a stronger component dependent on well defined semantic relationships between words. In particular, complex network analysis reveals that semantically related words show strong phase coherence. Ultimately, these previously unknown patterns in the statistics of language may be a consequence of changes in the cultural framework that influences the thematic focus of writers.
Fil: Montemurro, Marcelo Alejandro. University of Manchester; Reino Unido
Fil: Zanette, Damian Horacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Patagonia Norte; Argentina. Comisión Nacional de Energía Atómica. Gerencia del Area de Investigación y Aplicaciones No Nucleares. Gerencia de Física (Centro Atómico Bariloche); Argentina. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; Argentina
description In written language, the choice of specific words is constrained by both grammatical requirements and the specific semantic context of the message to be transmitted. To a significant degree, the semantic context is in turn affected by a broad cultural and historical environment, which also influences matters of style and manners. Over time, those environmental factors leave an imprint in the statistics of language use, with some words becoming more common and other words being preferred less. Here we characterize the patterns of language use over time based on word statistics extracted from more than 4.5 million books written over a period of 308 years. We find evidence of novel systematic oscillatory patterns in word use with a consistent period narrowly distributed around 14 years. The specific phase relationships between different words show structure at two independent levels: first, there is a weak global phase modulation that is primarily linked to overall shifts in the vocabulary across time; and second, a stronger component dependent on well defined semantic relationships between words. In particular, complex network analysis reveals that semantically related words show strong phase coherence. Ultimately, these previously unknown patterns in the statistics of language may be a consequence of changes in the cultural framework that influences the thematic focus of writers.
publishDate 2016
dc.date.none.fl_str_mv 2016-12-05
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/74505
Montemurro, Marcelo Alejandro; Zanette, Damian Horacio; Coherent oscillations in word-use data from 1700 to 2008; Palgrave Macmillan Ltd; Palgrave Communications; 2; 5-12-2016; 1-9
2055-1045
CONICET Digital
CONICET
url http://hdl.handle.net/11336/74505
identifier_str_mv Montemurro, Marcelo Alejandro; Zanette, Damian Horacio; Coherent oscillations in word-use data from 1700 to 2008; Palgrave Macmillan Ltd; Palgrave Communications; 2; 5-12-2016; 1-9
2055-1045
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://www.nature.com/articles/palcomms201684
info:eu-repo/semantics/altIdentifier/doi/10.1057/palcomms.2016.84
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Palgrave Macmillan Ltd
publisher.none.fl_str_mv Palgrave Macmillan Ltd
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844613245439049728
score 13.070432