Distributed Text Search using Suffix Arrays

Autores
Arroyuelo, Diego; Bonacic, Carolina; Gil Costa, Graciela Verónica; Marín, Mauricio; Navarro, Gonzalo
Año de publicación
2014
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Text search is a classical problem in Computer Science, with many data-intensive applications. For this problem, suffix arrays are among the most widely known and used data structures, enabling fast searches for phrases, terms, substrings and regular expressions in large texts. Potential application domains for these operations include large-scale search services, such as Web search engines, where it is necessary to efficiently process intensive-traffic streams of on-line queries. This paper proposes strategies to enable such services by means of suffix arrays. We introduce techniques for deploying suffix arrays on clusters of distributed-memory processors and then study the processing of multiple queries on the distributed data structure. Even though the cost of individual search operations in sequential (non-distributed) suffix arrays is low in practice, the problem of processing multiple queries on distributed-memory systems, so that hardware resources are used efficiently, is relevant to services aimed at achieving high query throughput at low operational costs. Our theoretical and experimental performance studies show that our proposals are suitable solutions for building efficient and scalable on-line search services based on suffix arrays.
Fil: Arroyuelo, Diego. Yahoo! Labs Santiago; Chile. Universidad Técnica Federico Santa María; Chile
Fil: Bonacic, Carolina. Universidad de Santiago de Chile; Chile
Fil: Gil Costa, Graciela Verónica. Universidad Nacional de San Luis. Facultad de Ciencias Fisico- Matematicas y Naturales; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico San Luis; Argentina. Yahoo! Labs Santiago; Chile
Fil: Marín, Mauricio. Yahoo! Labs Santiago; Chile. Universidad de Chile; Chile
Fil: Navarro, Gonzalo. Universidad de Chile; Chile
Materia
Arreglos de Sufijos
Sistemas Distribuidos
Distributed Text Search
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/7236

id CONICETDig_1e056a1d8897949bfc35dd28ab5f396a
oai_identifier_str oai:ri.conicet.gov.ar:11336/7236
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Distributed Text Search using Suffix ArraysArroyuelo, DiegoBonacic, CarolinaGil Costa, Graciela VerónicaMarín, MauricioNavarro, GonzaloArreglos de SufijosSistemas DistribuidosDistributed Text Searchhttps://purl.org/becyt/ford/2.2https://purl.org/becyt/ford/2Text search is a classical problem in Computer Science, with many data-intensive applications. For this problem, suffix arrays are among the most widely known and used data structures, enabling fast searches for phrases, terms, substrings and regular expressions in large texts. Potential application domains for these operations include large-scale search services, such as Web search engines, where it is necessary to efficiently process intensive-traffic streams of on-line queries. This paper proposes strategies to enable such services by means of suffix arrays. We introduce techniques for deploying suffix arrays on clusters of distributed-memory processors and then study the processing of multiple queries on the distributed data structure. Even though the cost of individual search operations in sequential (non-distributed) suffix arrays is low in practice, the problem of processing multiple queries on distributed-memory systems, so that hardware resources are used efficiently, is relevant to services aimed at achieving high query throughput at low operational costs. Our theoretical and experimental performance studies show that our proposals are suitable solutions for building efficient and scalable on-line search services based on suffix arrays.Fil: Arroyuelo, Diego. Yahoo! Labs Santiago; Chile. Universidad Técnica Federico Santa María; ChileFil: Bonacic, Carolina. Universidad de Santiago de Chile; ChileFil: Gil Costa, Graciela Verónica. Universidad Nacional de San Luis. Facultad de Ciencias Fisico- Matematicas y Naturales; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico San Luis; Argentina. Yahoo! Labs Santiago; ChileFil: Marín, Mauricio. Yahoo! Labs Santiago; Chile. Universidad de Chile; ChileFil: Navarro, Gonzalo. Universidad de Chile; ChileElsevier Science2014-07-11info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/7236Arroyuelo, Diego; Bonacic, Carolina; Gil Costa, Graciela Verónica; Marín, Mauricio; Navarro, Gonzalo; Distributed Text Search using Suffix Arrays; Elsevier Science; Parallel Computing; 40; 9; 11-7-2014; 471-4950167-8191enginfo:eu-repo/semantics/altIdentifier/doi/info:eu-repo/semantics/altIdentifier/doi/10.1016/j.parco.2014.06.007info:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0167819114000805info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-nd/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:19:53Zoai:ri.conicet.gov.ar:11336/7236instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:19:53.534CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Distributed Text Search using Suffix Arrays
title Distributed Text Search using Suffix Arrays
spellingShingle Distributed Text Search using Suffix Arrays
Arroyuelo, Diego
Arreglos de Sufijos
Sistemas Distribuidos
Distributed Text Search
title_short Distributed Text Search using Suffix Arrays
title_full Distributed Text Search using Suffix Arrays
title_fullStr Distributed Text Search using Suffix Arrays
title_full_unstemmed Distributed Text Search using Suffix Arrays
title_sort Distributed Text Search using Suffix Arrays
dc.creator.none.fl_str_mv Arroyuelo, Diego
Bonacic, Carolina
Gil Costa, Graciela Verónica
Marín, Mauricio
Navarro, Gonzalo
author Arroyuelo, Diego
author_facet Arroyuelo, Diego
Bonacic, Carolina
Gil Costa, Graciela Verónica
Marín, Mauricio
Navarro, Gonzalo
author_role author
author2 Bonacic, Carolina
Gil Costa, Graciela Verónica
Marín, Mauricio
Navarro, Gonzalo
author2_role author
author
author
author
dc.subject.none.fl_str_mv Arreglos de Sufijos
Sistemas Distribuidos
Distributed Text Search
topic Arreglos de Sufijos
Sistemas Distribuidos
Distributed Text Search
purl_subject.fl_str_mv https://purl.org/becyt/ford/2.2
https://purl.org/becyt/ford/2
dc.description.none.fl_txt_mv Text search is a classical problem in Computer Science, with many data-intensive applications. For this problem, suffix arrays are among the most widely known and used data structures, enabling fast searches for phrases, terms, substrings and regular expressions in large texts. Potential application domains for these operations include large-scale search services, such as Web search engines, where it is necessary to efficiently process intensive-traffic streams of on-line queries. This paper proposes strategies to enable such services by means of suffix arrays. We introduce techniques for deploying suffix arrays on clusters of distributed-memory processors and then study the processing of multiple queries on the distributed data structure. Even though the cost of individual search operations in sequential (non-distributed) suffix arrays is low in practice, the problem of processing multiple queries on distributed-memory systems, so that hardware resources are used efficiently, is relevant to services aimed at achieving high query throughput at low operational costs. Our theoretical and experimental performance studies show that our proposals are suitable solutions for building efficient and scalable on-line search services based on suffix arrays.
Fil: Arroyuelo, Diego. Yahoo! Labs Santiago; Chile. Universidad Técnica Federico Santa María; Chile
Fil: Bonacic, Carolina. Universidad de Santiago de Chile; Chile
Fil: Gil Costa, Graciela Verónica. Universidad Nacional de San Luis. Facultad de Ciencias Fisico- Matematicas y Naturales; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico San Luis; Argentina. Yahoo! Labs Santiago; Chile
Fil: Marín, Mauricio. Yahoo! Labs Santiago; Chile. Universidad de Chile; Chile
Fil: Navarro, Gonzalo. Universidad de Chile; Chile
description Text search is a classical problem in Computer Science, with many data-intensive applications. For this problem, suffix arrays are among the most widely known and used data structures, enabling fast searches for phrases, terms, substrings and regular expressions in large texts. Potential application domains for these operations include large-scale search services, such as Web search engines, where it is necessary to efficiently process intensive-traffic streams of on-line queries. This paper proposes strategies to enable such services by means of suffix arrays. We introduce techniques for deploying suffix arrays on clusters of distributed-memory processors and then study the processing of multiple queries on the distributed data structure. Even though the cost of individual search operations in sequential (non-distributed) suffix arrays is low in practice, the problem of processing multiple queries on distributed-memory systems, so that hardware resources are used efficiently, is relevant to services aimed at achieving high query throughput at low operational costs. Our theoretical and experimental performance studies show that our proposals are suitable solutions for building efficient and scalable on-line search services based on suffix arrays.
publishDate 2014
dc.date.none.fl_str_mv 2014-07-11
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/7236
Arroyuelo, Diego; Bonacic, Carolina; Gil Costa, Graciela Verónica; Marín, Mauricio; Navarro, Gonzalo; Distributed Text Search using Suffix Arrays; Elsevier Science; Parallel Computing; 40; 9; 11-7-2014; 471-495
0167-8191
url http://hdl.handle.net/11336/7236
identifier_str_mv Arroyuelo, Diego; Bonacic, Carolina; Gil Costa, Graciela Verónica; Marín, Mauricio; Navarro, Gonzalo; Distributed Text Search using Suffix Arrays; Elsevier Science; Parallel Computing; 40; 9; 11-7-2014; 471-495
0167-8191
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/
info:eu-repo/semantics/altIdentifier/doi/10.1016/j.parco.2014.06.007
info:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0167819114000805
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Elsevier Science
publisher.none.fl_str_mv Elsevier Science
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844614174524571648
score 13.070432