Efficient repeat finding in sets of strings via suffix arrays

Autores
Barenbaum, Pablo; Becher, Veronica Andrea; Deymonnaz, Alejandro; Halsband, Melisa; Heiber, Pablo Ariel
Año de publicación
2013
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
We consider two repeat finding problems relative to sets of strings: (a) Find the largest substrings that occur in every string of a given set; (b) Find the maximal repeats in a given string that occur in no string of a given set. Our solutions are based on the suffix array construction, requiring O(m) memory, where m is the length of the longest input string, and O(n &log;m) time, where n is the the whole input size (the sum of the length of each string in the input). The most expensive part of our algorithms is the computation of several suffix arrays. We give an implementation and experimental results that evidence the efficiency of our algorithms in practice, even for very large inputs.
Fil: Barenbaum, Pablo. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Oficina de Coordinacion Administrativa Ciudad Universitaria; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Fil: Becher, Veronica Andrea. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Oficina de Coordinacion Administrativa Ciudad Universitaria; Argentina
Fil: Deymonnaz, Alejandro. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Fil: Halsband, Melisa. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Fil: Heiber, Pablo Ariel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Oficina de Coordinacion Administrativa Ciudad Universitaria; Argentina
Materia
Stringology
Repeats
Suffix Array
Longest Maximal Substring
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/2502

id CONICETDig_ca0d3a1389b9effdf79f67fdf4383a3c
oai_identifier_str oai:ri.conicet.gov.ar:11336/2502
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Efficient repeat finding in sets of strings via suffix arraysBarenbaum, PabloBecher, Veronica AndreaDeymonnaz, AlejandroHalsband, MelisaHeiber, Pablo ArielStringologyRepeatsSuffix ArrayLongest Maximal Substringhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1We consider two repeat finding problems relative to sets of strings: (a) Find the largest substrings that occur in every string of a given set; (b) Find the maximal repeats in a given string that occur in no string of a given set. Our solutions are based on the suffix array construction, requiring O(m) memory, where m is the length of the longest input string, and O(n &log;m) time, where n is the the whole input size (the sum of the length of each string in the input). The most expensive part of our algorithms is the computation of several suffix arrays. We give an implementation and experimental results that evidence the efficiency of our algorithms in practice, even for very large inputs.Fil: Barenbaum, Pablo. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Oficina de Coordinacion Administrativa Ciudad Universitaria; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Becher, Veronica Andrea. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Oficina de Coordinacion Administrativa Ciudad Universitaria; ArgentinaFil: Deymonnaz, Alejandro. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Halsband, Melisa. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Heiber, Pablo Ariel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Oficina de Coordinacion Administrativa Ciudad Universitaria; ArgentinaDiscrete Mathematics and Theoretical Computer Science2013-04info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/2502Barenbaum, Pablo; Becher, Veronica Andrea; Deymonnaz, Alejandro; Halsband, Melisa; Heiber, Pablo Ariel; Efficient repeat finding in sets of strings via suffix arrays; Discrete Mathematics and Theoretical Computer Science; Discrete Mathematics and Theoretical Computer Science; 15; 2; 4-2013; 59-701365-8050enginfo:eu-repo/semantics/altIdentifier/url/https://hal.inria.fr/hal-00980753info:eu-repo/semantics/altIdentifier/url/http://dmtcs.episciences.org/597info:eu-repo/semantics/altIdentifier/url/http://www.dmtcs.org/dmtcs-ojs/index.php/dmtcs/article/view/2130info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-10-15T14:55:39Zoai:ri.conicet.gov.ar:11336/2502instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-10-15 14:55:39.722CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Efficient repeat finding in sets of strings via suffix arrays
title Efficient repeat finding in sets of strings via suffix arrays
spellingShingle Efficient repeat finding in sets of strings via suffix arrays
Barenbaum, Pablo
Stringology
Repeats
Suffix Array
Longest Maximal Substring
title_short Efficient repeat finding in sets of strings via suffix arrays
title_full Efficient repeat finding in sets of strings via suffix arrays
title_fullStr Efficient repeat finding in sets of strings via suffix arrays
title_full_unstemmed Efficient repeat finding in sets of strings via suffix arrays
title_sort Efficient repeat finding in sets of strings via suffix arrays
dc.creator.none.fl_str_mv Barenbaum, Pablo
Becher, Veronica Andrea
Deymonnaz, Alejandro
Halsband, Melisa
Heiber, Pablo Ariel
author Barenbaum, Pablo
author_facet Barenbaum, Pablo
Becher, Veronica Andrea
Deymonnaz, Alejandro
Halsband, Melisa
Heiber, Pablo Ariel
author_role author
author2 Becher, Veronica Andrea
Deymonnaz, Alejandro
Halsband, Melisa
Heiber, Pablo Ariel
author2_role author
author
author
author
dc.subject.none.fl_str_mv Stringology
Repeats
Suffix Array
Longest Maximal Substring
topic Stringology
Repeats
Suffix Array
Longest Maximal Substring
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv We consider two repeat finding problems relative to sets of strings: (a) Find the largest substrings that occur in every string of a given set; (b) Find the maximal repeats in a given string that occur in no string of a given set. Our solutions are based on the suffix array construction, requiring O(m) memory, where m is the length of the longest input string, and O(n &log;m) time, where n is the the whole input size (the sum of the length of each string in the input). The most expensive part of our algorithms is the computation of several suffix arrays. We give an implementation and experimental results that evidence the efficiency of our algorithms in practice, even for very large inputs.
Fil: Barenbaum, Pablo. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Oficina de Coordinacion Administrativa Ciudad Universitaria; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Fil: Becher, Veronica Andrea. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Oficina de Coordinacion Administrativa Ciudad Universitaria; Argentina
Fil: Deymonnaz, Alejandro. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Fil: Halsband, Melisa. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Fil: Heiber, Pablo Ariel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina. Consejo Nacional de Investigaciones Cientificas y Tecnicas. Oficina de Coordinacion Administrativa Ciudad Universitaria; Argentina
description We consider two repeat finding problems relative to sets of strings: (a) Find the largest substrings that occur in every string of a given set; (b) Find the maximal repeats in a given string that occur in no string of a given set. Our solutions are based on the suffix array construction, requiring O(m) memory, where m is the length of the longest input string, and O(n &log;m) time, where n is the the whole input size (the sum of the length of each string in the input). The most expensive part of our algorithms is the computation of several suffix arrays. We give an implementation and experimental results that evidence the efficiency of our algorithms in practice, even for very large inputs.
publishDate 2013
dc.date.none.fl_str_mv 2013-04
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/2502
Barenbaum, Pablo; Becher, Veronica Andrea; Deymonnaz, Alejandro; Halsband, Melisa; Heiber, Pablo Ariel; Efficient repeat finding in sets of strings via suffix arrays; Discrete Mathematics and Theoretical Computer Science; Discrete Mathematics and Theoretical Computer Science; 15; 2; 4-2013; 59-70
1365-8050
url http://hdl.handle.net/11336/2502
identifier_str_mv Barenbaum, Pablo; Becher, Veronica Andrea; Deymonnaz, Alejandro; Halsband, Melisa; Heiber, Pablo Ariel; Efficient repeat finding in sets of strings via suffix arrays; Discrete Mathematics and Theoretical Computer Science; Discrete Mathematics and Theoretical Computer Science; 15; 2; 4-2013; 59-70
1365-8050
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://hal.inria.fr/hal-00980753
info:eu-repo/semantics/altIdentifier/url/http://dmtcs.episciences.org/597
info:eu-repo/semantics/altIdentifier/url/http://www.dmtcs.org/dmtcs-ojs/index.php/dmtcs/article/view/2130
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Discrete Mathematics and Theoretical Computer Science
publisher.none.fl_str_mv Discrete Mathematics and Theoretical Computer Science
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1846083090823249920
score 13.22299