The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing
- Autores
- Carrascosa, Rafael; Coste, François; Galle, Matthias; Infante Lopez, Gabriel Gaston
- Año de publicación
- 2011
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- The smallest grammar problem—namely, finding a smallest context-free grammar that generates exactly one sequence—is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose a new perspective on this problem by splitting it into two tasks: (1) choosing which words will be the constituents of the grammar and (2) searching for the smallest grammar given this set of constituents. We show how to solve the second task in polynomial time parsing longer constituent with smaller ones. We propose new algorithms based on classical practical algorithms that use this optimization to find small grammars. Our algorithms consistently find smaller grammars on a classical benchmark reducing the size in 10% in some cases. Moreover, our formulation allows us to define interesting bounds on the number of small grammars and to empirically compare different grammars of small size.
Fil: Carrascosa, Rafael. Universidad Nacional de Córdoba; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina
Fil: Coste, François. Institut National de Recherche en Informatique et en Automatique; Francia
Fil: Galle, Matthias. Institut National de Recherche en Informatique et en Automatique; Francia
Fil: Infante Lopez, Gabriel Gaston. Universidad Nacional de Córdoba; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina - Materia
-
DATA DISCOVERY
HIERARCHICAL STRUCTURE INFERENCE
OPTIMAL PARSING
SMALLEST GRAMMAR PROBLEM - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/193597
Ver los metadatos del registro completo
id |
CONICETDig_31e3abfec6bf17d0e9b51b64febe2449 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/193597 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
The Smallest Grammar Problem as Constituents Choice and Minimal Grammar ParsingCarrascosa, RafaelCoste, FrançoisGalle, MatthiasInfante Lopez, Gabriel GastonDATA DISCOVERYHIERARCHICAL STRUCTURE INFERENCEOPTIMAL PARSINGSMALLEST GRAMMAR PROBLEMhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1The smallest grammar problem—namely, finding a smallest context-free grammar that generates exactly one sequence—is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose a new perspective on this problem by splitting it into two tasks: (1) choosing which words will be the constituents of the grammar and (2) searching for the smallest grammar given this set of constituents. We show how to solve the second task in polynomial time parsing longer constituent with smaller ones. We propose new algorithms based on classical practical algorithms that use this optimization to find small grammars. Our algorithms consistently find smaller grammars on a classical benchmark reducing the size in 10% in some cases. Moreover, our formulation allows us to define interesting bounds on the number of small grammars and to empirically compare different grammars of small size.Fil: Carrascosa, Rafael. Universidad Nacional de Córdoba; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; ArgentinaFil: Coste, François. Institut National de Recherche en Informatique et en Automatique; FranciaFil: Galle, Matthias. Institut National de Recherche en Informatique et en Automatique; FranciaFil: Infante Lopez, Gabriel Gaston. Universidad Nacional de Córdoba; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; ArgentinaMDPI2011-10info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/193597Carrascosa, Rafael; Coste, François; Galle, Matthias; Infante Lopez, Gabriel Gaston; The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing; MDPI; Algorithms; 4; 4; 10-2011; 262-2841999-4893CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/http://www.mdpi.com/1999-4893/4/4/262/info:eu-repo/semantics/altIdentifier/doi/10.3390/a4040262info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:36:21Zoai:ri.conicet.gov.ar:11336/193597instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:36:21.869CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing |
title |
The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing |
spellingShingle |
The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing Carrascosa, Rafael DATA DISCOVERY HIERARCHICAL STRUCTURE INFERENCE OPTIMAL PARSING SMALLEST GRAMMAR PROBLEM |
title_short |
The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing |
title_full |
The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing |
title_fullStr |
The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing |
title_full_unstemmed |
The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing |
title_sort |
The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing |
dc.creator.none.fl_str_mv |
Carrascosa, Rafael Coste, François Galle, Matthias Infante Lopez, Gabriel Gaston |
author |
Carrascosa, Rafael |
author_facet |
Carrascosa, Rafael Coste, François Galle, Matthias Infante Lopez, Gabriel Gaston |
author_role |
author |
author2 |
Coste, François Galle, Matthias Infante Lopez, Gabriel Gaston |
author2_role |
author author author |
dc.subject.none.fl_str_mv |
DATA DISCOVERY HIERARCHICAL STRUCTURE INFERENCE OPTIMAL PARSING SMALLEST GRAMMAR PROBLEM |
topic |
DATA DISCOVERY HIERARCHICAL STRUCTURE INFERENCE OPTIMAL PARSING SMALLEST GRAMMAR PROBLEM |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
The smallest grammar problem—namely, finding a smallest context-free grammar that generates exactly one sequence—is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose a new perspective on this problem by splitting it into two tasks: (1) choosing which words will be the constituents of the grammar and (2) searching for the smallest grammar given this set of constituents. We show how to solve the second task in polynomial time parsing longer constituent with smaller ones. We propose new algorithms based on classical practical algorithms that use this optimization to find small grammars. Our algorithms consistently find smaller grammars on a classical benchmark reducing the size in 10% in some cases. Moreover, our formulation allows us to define interesting bounds on the number of small grammars and to empirically compare different grammars of small size. Fil: Carrascosa, Rafael. Universidad Nacional de Córdoba; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina Fil: Coste, François. Institut National de Recherche en Informatique et en Automatique; Francia Fil: Galle, Matthias. Institut National de Recherche en Informatique et en Automatique; Francia Fil: Infante Lopez, Gabriel Gaston. Universidad Nacional de Córdoba; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina |
description |
The smallest grammar problem—namely, finding a smallest context-free grammar that generates exactly one sequence—is of practical and theoretical importance in fields such as Kolmogorov complexity, data compression and pattern discovery. We propose a new perspective on this problem by splitting it into two tasks: (1) choosing which words will be the constituents of the grammar and (2) searching for the smallest grammar given this set of constituents. We show how to solve the second task in polynomial time parsing longer constituent with smaller ones. We propose new algorithms based on classical practical algorithms that use this optimization to find small grammars. Our algorithms consistently find smaller grammars on a classical benchmark reducing the size in 10% in some cases. Moreover, our formulation allows us to define interesting bounds on the number of small grammars and to empirically compare different grammars of small size. |
publishDate |
2011 |
dc.date.none.fl_str_mv |
2011-10 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/193597 Carrascosa, Rafael; Coste, François; Galle, Matthias; Infante Lopez, Gabriel Gaston; The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing; MDPI; Algorithms; 4; 4; 10-2011; 262-284 1999-4893 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/193597 |
identifier_str_mv |
Carrascosa, Rafael; Coste, François; Galle, Matthias; Infante Lopez, Gabriel Gaston; The Smallest Grammar Problem as Constituents Choice and Minimal Grammar Parsing; MDPI; Algorithms; 4; 4; 10-2011; 262-284 1999-4893 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/http://www.mdpi.com/1999-4893/4/4/262/ info:eu-repo/semantics/altIdentifier/doi/10.3390/a4040262 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
MDPI |
publisher.none.fl_str_mv |
MDPI |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844614383976579072 |
score |
13.070432 |