The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing

Autores
Carrascosa, Rafael; Coste, François; Galle, Matthias; Infante Lopez, Gabriel Gaston
Año de publicación
2011
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
The smallest grammar problem—namely, finding a smallest context-free grammar that generates exactly one sequence—is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose a new perspective on this problem by splitting it into two tasks: (1) choosing which words will be the constituents of the grammar and (2) searching for the smallest grammar given this set of constituents. We show how to solve the second task in polynomial time parsing longer constituent with smaller ones. We propose new algorithms based on classical practical algorithms that use this optimization to find small grammars. Our algorithms consistently find smaller grammars on a classical benchmark reducing the size in 10% in some cases. Moreover, our formulation allows us to define interesting bounds on the number of small grammars and to empirically compare different grammars of small size.
Fil: Carrascosa, Rafael. Universidad Nacional de Córdoba; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina
Fil: Coste, François. Institut National de Recherche en Informatique et en Automatique; Francia
Fil: Galle, Matthias. Institut National de Recherche en Informatique et en Automatique; Francia
Fil: Infante Lopez, Gabriel Gaston. Universidad Nacional de Córdoba; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina
Materia
DATA DISCOVERY
HIERARCHICAL STRUCTURE INFERENCE
OPTIMAL PARSING
SMALLEST GRAMMAR PROBLEM
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/193597

id CONICETDig_31e3abfec6bf17d0e9b51b64febe2449
oai_identifier_str oai:ri.conicet.gov.ar:11336/193597
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling The Smallest Grammar Problem as Constituents Choice and Minimal Grammar ParsingCarrascosa, RafaelCoste, FrançoisGalle, MatthiasInfante Lopez, Gabriel GastonDATA DISCOVERYHIERARCHICAL STRUCTURE INFERENCEOPTIMAL PARSINGSMALLEST GRAMMAR PROBLEMhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1The smallest grammar problem—namely, finding a smallest context-free grammar that generates exactly one sequence—is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose a new perspective on this problem by splitting it into two tasks: (1) choosing which words will be the constituents of the grammar and (2) searching for the smallest grammar given this set of constituents. We show how to solve the second task in polynomial time parsing longer constituent with smaller ones. We propose new algorithms based on classical practical algorithms that use this optimization to find small grammars. Our algorithms consistently find smaller grammars on a classical benchmark reducing the size in 10% in some cases. Moreover, our formulation allows us to define interesting bounds on the number of small grammars and to empirically compare different grammars of small size.Fil: Carrascosa, Rafael. Universidad Nacional de Córdoba; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; ArgentinaFil: Coste, François. Institut National de Recherche en Informatique et en Automatique; FranciaFil: Galle, Matthias. Institut National de Recherche en Informatique et en Automatique; FranciaFil: Infante Lopez, Gabriel Gaston. Universidad Nacional de Córdoba; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; ArgentinaMDPI2011-10info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/193597Carrascosa, Rafael; Coste, François; Galle, Matthias; Infante Lopez, Gabriel Gaston; The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing; MDPI; Algorithms; 4; 4; 10-2011; 262-2841999-4893CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/http://www.mdpi.com/1999-4893/4/4/262/info:eu-repo/semantics/altIdentifier/doi/10.3390/a4040262info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:36:21Zoai:ri.conicet.gov.ar:11336/193597instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:36:21.869CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing
title The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing
spellingShingle The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing
Carrascosa, Rafael
DATA DISCOVERY
HIERARCHICAL STRUCTURE INFERENCE
OPTIMAL PARSING
SMALLEST GRAMMAR PROBLEM
title_short The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing
title_full The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing
title_fullStr The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing
title_full_unstemmed The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing
title_sort The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing
dc.creator.none.fl_str_mv Carrascosa, Rafael
Coste, François
Galle, Matthias
Infante Lopez, Gabriel Gaston
author Carrascosa, Rafael
author_facet Carrascosa, Rafael
Coste, François
Galle, Matthias
Infante Lopez, Gabriel Gaston
author_role author
author2 Coste, François
Galle, Matthias
Infante Lopez, Gabriel Gaston
author2_role author
author
author
dc.subject.none.fl_str_mv DATA DISCOVERY
HIERARCHICAL STRUCTURE INFERENCE
OPTIMAL PARSING
SMALLEST GRAMMAR PROBLEM
topic DATA DISCOVERY
HIERARCHICAL STRUCTURE INFERENCE
OPTIMAL PARSING
SMALLEST GRAMMAR PROBLEM
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv The smallest grammar problem—namely, finding a smallest context-free grammar that generates exactly one sequence—is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose a new perspective on this problem by splitting it into two tasks: (1) choosing which words will be the constituents of the grammar and (2) searching for the smallest grammar given this set of constituents. We show how to solve the second task in polynomial time parsing longer constituent with smaller ones. We propose new algorithms based on classical practical algorithms that use this optimization to find small grammars. Our algorithms consistently find smaller grammars on a classical benchmark reducing the size in 10% in some cases. Moreover, our formulation allows us to define interesting bounds on the number of small grammars and to empirically compare different grammars of small size.
Fil: Carrascosa, Rafael. Universidad Nacional de Córdoba; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina
Fil: Coste, François. Institut National de Recherche en Informatique et en Automatique; Francia
Fil: Galle, Matthias. Institut National de Recherche en Informatique et en Automatique; Francia
Fil: Infante Lopez, Gabriel Gaston. Universidad Nacional de Córdoba; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina
description The smallest grammar problem—namely, finding a smallest context-free grammar that generates exactly one sequence—is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose a new perspective on this problem by splitting it into two tasks: (1) choosing which words will be the constituents of the grammar and (2) searching for the smallest grammar given this set of constituents. We show how to solve the second task in polynomial time parsing longer constituent with smaller ones. We propose new algorithms based on classical practical algorithms that use this optimization to find small grammars. Our algorithms consistently find smaller grammars on a classical benchmark reducing the size in 10% in some cases. Moreover, our formulation allows us to define interesting bounds on the number of small grammars and to empirically compare different grammars of small size.
publishDate 2011
dc.date.none.fl_str_mv 2011-10
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/193597
Carrascosa, Rafael; Coste, François; Galle, Matthias; Infante Lopez, Gabriel Gaston; The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing; MDPI; Algorithms; 4; 4; 10-2011; 262-284
1999-4893
CONICET Digital
CONICET
url http://hdl.handle.net/11336/193597
identifier_str_mv Carrascosa, Rafael; Coste, François; Galle, Matthias; Infante Lopez, Gabriel Gaston; The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing; MDPI; Algorithms; 4; 4; 10-2011; 262-284
1999-4893
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/http://www.mdpi.com/1999-4893/4/4/262/
info:eu-repo/semantics/altIdentifier/doi/10.3390/a4040262
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv MDPI
publisher.none.fl_str_mv MDPI
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844614383976579072
score 13.070432