Word frequency–rank relationship in tagged texts

Autores
Chacoma, Andrés Alberto; Zanette, Damian Horacio
Año de publicación
2021
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
We analyze the frequency–rank relationship in sub-vocabularies corresponding to three different grammatical classes (nouns, verbs, and others) in a collection of literary works in English, whose words have been automatically tagged according to their grammatical role. Comparing with a null hypothesis which assumes that words belonging to each class are uniformly distributed across the frequency–ranked vocabulary of the whole work, we disclose statistically significant differences between the three classes. This results point to the fact that frequency–rank relationships may reflect linguistic features associated with grammatical function.
Fil: Chacoma, Andrés Alberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Física Enrique Gaviola. Universidad Nacional de Córdoba. Instituto de Física Enrique Gaviola; Argentina
Fil: Zanette, Damian Horacio. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; Argentina. Comisión Nacional de Energía Atómica. Centro Atómico Bariloche; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Patagonia Norte; Argentina
Materia
FREQUENCY–RANK STATISTICS
GRAMMATICAL FUNCTION
LANGUAGE PROCESSING
LINGUISTIC REGULARITIES
QUANTITATIVE LINGUISTICS
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/185074

id CONICETDig_934bdffe8d3ca72c3af53515b46c702a
oai_identifier_str oai:ri.conicet.gov.ar:11336/185074
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Word frequency–rank relationship in tagged textsChacoma, Andrés AlbertoZanette, Damian HoracioFREQUENCY–RANK STATISTICSGRAMMATICAL FUNCTIONLANGUAGE PROCESSINGLINGUISTIC REGULARITIESQUANTITATIVE LINGUISTICShttps://purl.org/becyt/ford/1.3https://purl.org/becyt/ford/1We analyze the frequency–rank relationship in sub-vocabularies corresponding to three different grammatical classes (nouns, verbs, and others) in a collection of literary works in English, whose words have been automatically tagged according to their grammatical role. Comparing with a null hypothesis which assumes that words belonging to each class are uniformly distributed across the frequency–ranked vocabulary of the whole work, we disclose statistically significant differences between the three classes. This results point to the fact that frequency–rank relationships may reflect linguistic features associated with grammatical function.Fil: Chacoma, Andrés Alberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Física Enrique Gaviola. Universidad Nacional de Córdoba. Instituto de Física Enrique Gaviola; ArgentinaFil: Zanette, Damian Horacio. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; Argentina. Comisión Nacional de Energía Atómica. Centro Atómico Bariloche; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Patagonia Norte; ArgentinaElsevier Science2021-07-15info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/185074Chacoma, Andrés Alberto; Zanette, Damian Horacio; Word frequency–rank relationship in tagged texts; Elsevier Science; Physica A: Statistical Mechanics and its Applications; 574; 15-7-2021; 1-100378-43711873-2119CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.sciencedirect.com/science/article/pii/S0378437121002922?via%3Dihubinfo:eu-repo/semantics/altIdentifier/doi/10.1016/j.physa.2021.126020info:eu-repo/semantics/altIdentifier/url/https://arxiv.org/abs/2102.10992info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:18:23Zoai:ri.conicet.gov.ar:11336/185074instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:18:23.856CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Word frequency–rank relationship in tagged texts
title Word frequency–rank relationship in tagged texts
spellingShingle Word frequency–rank relationship in tagged texts
Chacoma, Andrés Alberto
FREQUENCY–RANK STATISTICS
GRAMMATICAL FUNCTION
LANGUAGE PROCESSING
LINGUISTIC REGULARITIES
QUANTITATIVE LINGUISTICS
title_short Word frequency–rank relationship in tagged texts
title_full Word frequency–rank relationship in tagged texts
title_fullStr Word frequency–rank relationship in tagged texts
title_full_unstemmed Word frequency–rank relationship in tagged texts
title_sort Word frequency–rank relationship in tagged texts
dc.creator.none.fl_str_mv Chacoma, Andrés Alberto
Zanette, Damian Horacio
author Chacoma, Andrés Alberto
author_facet Chacoma, Andrés Alberto
Zanette, Damian Horacio
author_role author
author2 Zanette, Damian Horacio
author2_role author
dc.subject.none.fl_str_mv FREQUENCY–RANK STATISTICS
GRAMMATICAL FUNCTION
LANGUAGE PROCESSING
LINGUISTIC REGULARITIES
QUANTITATIVE LINGUISTICS
topic FREQUENCY–RANK STATISTICS
GRAMMATICAL FUNCTION
LANGUAGE PROCESSING
LINGUISTIC REGULARITIES
QUANTITATIVE LINGUISTICS
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.3
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv We analyze the frequency–rank relationship in sub-vocabularies corresponding to three different grammatical classes (nouns, verbs, and others) in a collection of literary works in English, whose words have been automatically tagged according to their grammatical role. Comparing with a null hypothesis which assumes that words belonging to each class are uniformly distributed across the frequency–ranked vocabulary of the whole work, we disclose statistically significant differences between the three classes. This results point to the fact that frequency–rank relationships may reflect linguistic features associated with grammatical function.
Fil: Chacoma, Andrés Alberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Física Enrique Gaviola. Universidad Nacional de Córdoba. Instituto de Física Enrique Gaviola; Argentina
Fil: Zanette, Damian Horacio. Comisión Nacional de Energía Atómica. Gerencia del Área de Energía Nuclear. Instituto Balseiro; Argentina. Comisión Nacional de Energía Atómica. Centro Atómico Bariloche; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Patagonia Norte; Argentina
description We analyze the frequency–rank relationship in sub-vocabularies corresponding to three different grammatical classes (nouns, verbs, and others) in a collection of literary works in English, whose words have been automatically tagged according to their grammatical role. Comparing with a null hypothesis which assumes that words belonging to each class are uniformly distributed across the frequency–ranked vocabulary of the whole work, we disclose statistically significant differences between the three classes. This results point to the fact that frequency–rank relationships may reflect linguistic features associated with grammatical function.
publishDate 2021
dc.date.none.fl_str_mv 2021-07-15
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/185074
Chacoma, Andrés Alberto; Zanette, Damian Horacio; Word frequency–rank relationship in tagged texts; Elsevier Science; Physica A: Statistical Mechanics and its Applications; 574; 15-7-2021; 1-10
0378-4371
1873-2119
CONICET Digital
CONICET
url http://hdl.handle.net/11336/185074
identifier_str_mv Chacoma, Andrés Alberto; Zanette, Damian Horacio; Word frequency–rank relationship in tagged texts; Elsevier Science; Physica A: Statistical Mechanics and its Applications; 574; 15-7-2021; 1-10
0378-4371
1873-2119
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://www.sciencedirect.com/science/article/pii/S0378437121002922?via%3Dihub
info:eu-repo/semantics/altIdentifier/doi/10.1016/j.physa.2021.126020
info:eu-repo/semantics/altIdentifier/url/https://arxiv.org/abs/2102.10992
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv Elsevier Science
publisher.none.fl_str_mv Elsevier Science
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844614145938292736
score 13.070432