Efficient repeat finding in sets of strings via suffix arrays
- Autores
- Barenbaum, Pablo; Becher, Veronica Andrea; Deymonnaz, Alejandro; Halsband, Melisa; Heiber, Pablo Ariel
- Año de publicación
- 2013
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- We consider two repeat finding problems relative to sets of strings: (a) Find the largest substrings that occur in every string of a given set; (b) Find the maximal repeats in a given string that occur in no string of a given set. Our solutions are based on the suffix array construction, requiring O(m) memory, where m is the length of the longest input string, and O(n &log;m) time, where n is the the whole input size (the sum of the length of each string in the input). The most expensive part of our algorithms is the computation of several suffix arrays. We give an implementation and experimental results that evidence the efficiency of our algorithms in practice, even for very large inputs.
Fil: Barenbaum, Pablo. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Oficina de Coordinacion Administrativa Ciudad Universitaria; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Fil: Becher, Veronica Andrea. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Oficina de Coordinacion Administrativa Ciudad Universitaria; Argentina
Fil: Deymonnaz, Alejandro. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Fil: Halsband, Melisa. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Fil: Heiber, Pablo Ariel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Oficina de Coordinacion Administrativa Ciudad Universitaria; Argentina - Materia
-
Stringology
Repeats
Suffix Array
Longest Maximal Substring - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
.jpg)
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/2502
Ver los metadatos del registro completo
| id |
CONICETDig_ca0d3a1389b9effdf79f67fdf4383a3c |
|---|---|
| oai_identifier_str |
oai:ri.conicet.gov.ar:11336/2502 |
| network_acronym_str |
CONICETDig |
| repository_id_str |
3498 |
| network_name_str |
CONICET Digital (CONICET) |
| spelling |
Efficient repeat finding in sets of strings via suffix arraysBarenbaum, PabloBecher, Veronica AndreaDeymonnaz, AlejandroHalsband, MelisaHeiber, Pablo ArielStringologyRepeatsSuffix ArrayLongest Maximal Substringhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1We consider two repeat finding problems relative to sets of strings: (a) Find the largest substrings that occur in every string of a given set; (b) Find the maximal repeats in a given string that occur in no string of a given set. Our solutions are based on the suffix array construction, requiring O(m) memory, where m is the length of the longest input string, and O(n &log;m) time, where n is the the whole input size (the sum of the length of each string in the input). The most expensive part of our algorithms is the computation of several suffix arrays. We give an implementation and experimental results that evidence the efficiency of our algorithms in practice, even for very large inputs.Fil: Barenbaum, Pablo. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Oficina de Coordinacion Administrativa Ciudad Universitaria; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Becher, Veronica Andrea. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Oficina de Coordinacion Administrativa Ciudad Universitaria; ArgentinaFil: Deymonnaz, Alejandro. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Halsband, Melisa. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Heiber, Pablo Ariel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Oficina de Coordinacion Administrativa Ciudad Universitaria; ArgentinaDiscrete Mathematics and Theoretical Computer Science2013-04info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/2502Barenbaum, Pablo; Becher, Veronica Andrea; Deymonnaz, Alejandro; Halsband, Melisa; Heiber, Pablo Ariel; Efficient repeat finding in sets of strings via suffix arrays; Discrete Mathematics and Theoretical Computer Science; Discrete Mathematics and Theoretical Computer Science; 15; 2; 4-2013; 59-701365-8050enginfo:eu-repo/semantics/altIdentifier/url/https://hal.inria.fr/hal-00980753info:eu-repo/semantics/altIdentifier/url/http://dmtcs.episciences.org/597info:eu-repo/semantics/altIdentifier/url/http://www.dmtcs.org/dmtcs-ojs/index.php/dmtcs/article/view/2130info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-10-15T14:55:39Zoai:ri.conicet.gov.ar:11336/2502instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-10-15 14:55:39.722CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
| dc.title.none.fl_str_mv |
Efficient repeat finding in sets of strings via suffix arrays |
| title |
Efficient repeat finding in sets of strings via suffix arrays |
| spellingShingle |
Efficient repeat finding in sets of strings via suffix arrays Barenbaum, Pablo Stringology Repeats Suffix Array Longest Maximal Substring |
| title_short |
Efficient repeat finding in sets of strings via suffix arrays |
| title_full |
Efficient repeat finding in sets of strings via suffix arrays |
| title_fullStr |
Efficient repeat finding in sets of strings via suffix arrays |
| title_full_unstemmed |
Efficient repeat finding in sets of strings via suffix arrays |
| title_sort |
Efficient repeat finding in sets of strings via suffix arrays |
| dc.creator.none.fl_str_mv |
Barenbaum, Pablo Becher, Veronica Andrea Deymonnaz, Alejandro Halsband, Melisa Heiber, Pablo Ariel |
| author |
Barenbaum, Pablo |
| author_facet |
Barenbaum, Pablo Becher, Veronica Andrea Deymonnaz, Alejandro Halsband, Melisa Heiber, Pablo Ariel |
| author_role |
author |
| author2 |
Becher, Veronica Andrea Deymonnaz, Alejandro Halsband, Melisa Heiber, Pablo Ariel |
| author2_role |
author author author author |
| dc.subject.none.fl_str_mv |
Stringology Repeats Suffix Array Longest Maximal Substring |
| topic |
Stringology Repeats Suffix Array Longest Maximal Substring |
| purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
| dc.description.none.fl_txt_mv |
We consider two repeat finding problems relative to sets of strings: (a) Find the largest substrings that occur in every string of a given set; (b) Find the maximal repeats in a given string that occur in no string of a given set. Our solutions are based on the suffix array construction, requiring O(m) memory, where m is the length of the longest input string, and O(n &log;m) time, where n is the the whole input size (the sum of the length of each string in the input). The most expensive part of our algorithms is the computation of several suffix arrays. We give an implementation and experimental results that evidence the efficiency of our algorithms in practice, even for very large inputs. Fil: Barenbaum, Pablo. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Oficina de Coordinacion Administrativa Ciudad Universitaria; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina Fil: Becher, Veronica Andrea. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Oficina de Coordinacion Administrativa Ciudad Universitaria; Argentina Fil: Deymonnaz, Alejandro. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina Fil: Halsband, Melisa. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina Fil: Heiber, Pablo Ariel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Oficina de Coordinacion Administrativa Ciudad Universitaria; Argentina |
| description |
We consider two repeat finding problems relative to sets of strings: (a) Find the largest substrings that occur in every string of a given set; (b) Find the maximal repeats in a given string that occur in no string of a given set. Our solutions are based on the suffix array construction, requiring O(m) memory, where m is the length of the longest input string, and O(n &log;m) time, where n is the the whole input size (the sum of the length of each string in the input). The most expensive part of our algorithms is the computation of several suffix arrays. We give an implementation and experimental results that evidence the efficiency of our algorithms in practice, even for very large inputs. |
| publishDate |
2013 |
| dc.date.none.fl_str_mv |
2013-04 |
| dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
| format |
article |
| status_str |
publishedVersion |
| dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/2502 Barenbaum, Pablo; Becher, Veronica Andrea; Deymonnaz, Alejandro; Halsband, Melisa; Heiber, Pablo Ariel; Efficient repeat finding in sets of strings via suffix arrays; Discrete Mathematics and Theoretical Computer Science; Discrete Mathematics and Theoretical Computer Science; 15; 2; 4-2013; 59-70 1365-8050 |
| url |
http://hdl.handle.net/11336/2502 |
| identifier_str_mv |
Barenbaum, Pablo; Becher, Veronica Andrea; Deymonnaz, Alejandro; Halsband, Melisa; Heiber, Pablo Ariel; Efficient repeat finding in sets of strings via suffix arrays; Discrete Mathematics and Theoretical Computer Science; Discrete Mathematics and Theoretical Computer Science; 15; 2; 4-2013; 59-70 1365-8050 |
| dc.language.none.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://hal.inria.fr/hal-00980753 info:eu-repo/semantics/altIdentifier/url/http://dmtcs.episciences.org/597 info:eu-repo/semantics/altIdentifier/url/http://www.dmtcs.org/dmtcs-ojs/index.php/dmtcs/article/view/2130 |
| dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
| eu_rights_str_mv |
openAccess |
| rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
| dc.format.none.fl_str_mv |
application/pdf application/pdf |
| dc.publisher.none.fl_str_mv |
Discrete Mathematics and Theoretical Computer Science |
| publisher.none.fl_str_mv |
Discrete Mathematics and Theoretical Computer Science |
| dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
| reponame_str |
CONICET Digital (CONICET) |
| collection |
CONICET Digital (CONICET) |
| instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
| repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
| repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
| _version_ |
1846083090823249920 |
| score |
13.22299 |