Distributed Text Search using Suffix Arrays
- Autores
- Arroyuelo, Diego; Bonacic, Carolina; Gil Costa, Graciela Verónica; Marín, Mauricio; Navarro, Gonzalo
- Año de publicación
- 2014
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- Text search is a classical problem in Computer Science, with many data-intensive applications. For this problem, suffix arrays are among the most widely known and used data structures, enabling fast searches for phrases, terms, substrings and regular expressions in large texts. Potential application domains for these operations include large-scale search services, such as Web search engines, where it is necessary to efficiently process intensive-traffic streams of on-line queries. This paper proposes strategies to enable such services by means of suffix arrays. We introduce techniques for deploying suffix arrays on clusters of distributed-memory processors and then study the processing of multiple queries on the distributed data structure. Even though the cost of individual search operations in sequential (non-distributed) suffix arrays is low in practice, the problem of processing multiple queries on distributed-memory systems, so that hardware resources are used efficiently, is relevant to services aimed at achieving high query throughput at low operational costs. Our theoretical and experimental performance studies show that our proposals are suitable solutions for building efficient and scalable on-line search services based on suffix arrays.
Fil: Arroyuelo, Diego. Yahoo! Labs Santiago; Chile. Universidad Técnica Federico Santa María; Chile
Fil: Bonacic, Carolina. Universidad de Santiago de Chile; Chile
Fil: Gil Costa, Graciela Verónica. Universidad Nacional de San Luis. Facultad de Ciencias Fisico- Matematicas y Naturales; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico San Luis; Argentina. Yahoo! Labs Santiago; Chile
Fil: Marín, Mauricio. Yahoo! Labs Santiago; Chile. Universidad de Chile; Chile
Fil: Navarro, Gonzalo. Universidad de Chile; Chile - Materia
-
Arreglos de Sufijos
Sistemas Distribuidos
Distributed Text Search - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/7236
Ver los metadatos del registro completo
id |
CONICETDig_1e056a1d8897949bfc35dd28ab5f396a |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/7236 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Distributed Text Search using Suffix ArraysArroyuelo, DiegoBonacic, CarolinaGil Costa, Graciela VerónicaMarín, MauricioNavarro, GonzaloArreglos de SufijosSistemas DistribuidosDistributed Text Searchhttps://purl.org/becyt/ford/2.2https://purl.org/becyt/ford/2Text search is a classical problem in Computer Science, with many data-intensive applications. For this problem, suffix arrays are among the most widely known and used data structures, enabling fast searches for phrases, terms, substrings and regular expressions in large texts. Potential application domains for these operations include large-scale search services, such as Web search engines, where it is necessary to efficiently process intensive-traffic streams of on-line queries. This paper proposes strategies to enable such services by means of suffix arrays. We introduce techniques for deploying suffix arrays on clusters of distributed-memory processors and then study the processing of multiple queries on the distributed data structure. Even though the cost of individual search operations in sequential (non-distributed) suffix arrays is low in practice, the problem of processing multiple queries on distributed-memory systems, so that hardware resources are used efficiently, is relevant to services aimed at achieving high query throughput at low operational costs. Our theoretical and experimental performance studies show that our proposals are suitable solutions for building efficient and scalable on-line search services based on suffix arrays.Fil: Arroyuelo, Diego. Yahoo! Labs Santiago; Chile. Universidad Técnica Federico Santa María; ChileFil: Bonacic, Carolina. Universidad de Santiago de Chile; ChileFil: Gil Costa, Graciela Verónica. Universidad Nacional de San Luis. Facultad de Ciencias Fisico- Matematicas y Naturales; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico San Luis; Argentina. Yahoo! Labs Santiago; ChileFil: Marín, Mauricio. Yahoo! Labs Santiago; Chile. Universidad de Chile; ChileFil: Navarro, Gonzalo. Universidad de Chile; ChileElsevier Science2014-07-11info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/7236Arroyuelo, Diego; Bonacic, Carolina; Gil Costa, Graciela Verónica; Marín, Mauricio; Navarro, Gonzalo; Distributed Text Search using Suffix Arrays; Elsevier Science; Parallel Computing; 40; 9; 11-7-2014; 471-4950167-8191enginfo:eu-repo/semantics/altIdentifier/doi/info:eu-repo/semantics/altIdentifier/doi/10.1016/j.parco.2014.06.007info:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0167819114000805info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-nd/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:19:53Zoai:ri.conicet.gov.ar:11336/7236instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:19:53.534CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Distributed Text Search using Suffix Arrays |
title |
Distributed Text Search using Suffix Arrays |
spellingShingle |
Distributed Text Search using Suffix Arrays Arroyuelo, Diego Arreglos de Sufijos Sistemas Distribuidos Distributed Text Search |
title_short |
Distributed Text Search using Suffix Arrays |
title_full |
Distributed Text Search using Suffix Arrays |
title_fullStr |
Distributed Text Search using Suffix Arrays |
title_full_unstemmed |
Distributed Text Search using Suffix Arrays |
title_sort |
Distributed Text Search using Suffix Arrays |
dc.creator.none.fl_str_mv |
Arroyuelo, Diego Bonacic, Carolina Gil Costa, Graciela Verónica Marín, Mauricio Navarro, Gonzalo |
author |
Arroyuelo, Diego |
author_facet |
Arroyuelo, Diego Bonacic, Carolina Gil Costa, Graciela Verónica Marín, Mauricio Navarro, Gonzalo |
author_role |
author |
author2 |
Bonacic, Carolina Gil Costa, Graciela Verónica Marín, Mauricio Navarro, Gonzalo |
author2_role |
author author author author |
dc.subject.none.fl_str_mv |
Arreglos de Sufijos Sistemas Distribuidos Distributed Text Search |
topic |
Arreglos de Sufijos Sistemas Distribuidos Distributed Text Search |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/2.2 https://purl.org/becyt/ford/2 |
dc.description.none.fl_txt_mv |
Text search is a classical problem in Computer Science, with many data-intensive applications. For this problem, suffix arrays are among the most widely known and used data structures, enabling fast searches for phrases, terms, substrings and regular expressions in large texts. Potential application domains for these operations include large-scale search services, such as Web search engines, where it is necessary to efficiently process intensive-traffic streams of on-line queries. This paper proposes strategies to enable such services by means of suffix arrays. We introduce techniques for deploying suffix arrays on clusters of distributed-memory processors and then study the processing of multiple queries on the distributed data structure. Even though the cost of individual search operations in sequential (non-distributed) suffix arrays is low in practice, the problem of processing multiple queries on distributed-memory systems, so that hardware resources are used efficiently, is relevant to services aimed at achieving high query throughput at low operational costs. Our theoretical and experimental performance studies show that our proposals are suitable solutions for building efficient and scalable on-line search services based on suffix arrays. Fil: Arroyuelo, Diego. Yahoo! Labs Santiago; Chile. Universidad Técnica Federico Santa María; Chile Fil: Bonacic, Carolina. Universidad de Santiago de Chile; Chile Fil: Gil Costa, Graciela Verónica. Universidad Nacional de San Luis. Facultad de Ciencias Fisico- Matematicas y Naturales; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico San Luis; Argentina. Yahoo! Labs Santiago; Chile Fil: Marín, Mauricio. Yahoo! Labs Santiago; Chile. Universidad de Chile; Chile Fil: Navarro, Gonzalo. Universidad de Chile; Chile |
description |
Text search is a classical problem in Computer Science, with many data-intensive applications. For this problem, suffix arrays are among the most widely known and used data structures, enabling fast searches for phrases, terms, substrings and regular expressions in large texts. Potential application domains for these operations include large-scale search services, such as Web search engines, where it is necessary to efficiently process intensive-traffic streams of on-line queries. This paper proposes strategies to enable such services by means of suffix arrays. We introduce techniques for deploying suffix arrays on clusters of distributed-memory processors and then study the processing of multiple queries on the distributed data structure. Even though the cost of individual search operations in sequential (non-distributed) suffix arrays is low in practice, the problem of processing multiple queries on distributed-memory systems, so that hardware resources are used efficiently, is relevant to services aimed at achieving high query throughput at low operational costs. Our theoretical and experimental performance studies show that our proposals are suitable solutions for building efficient and scalable on-line search services based on suffix arrays. |
publishDate |
2014 |
dc.date.none.fl_str_mv |
2014-07-11 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/7236 Arroyuelo, Diego; Bonacic, Carolina; Gil Costa, Graciela Verónica; Marín, Mauricio; Navarro, Gonzalo; Distributed Text Search using Suffix Arrays; Elsevier Science; Parallel Computing; 40; 9; 11-7-2014; 471-495 0167-8191 |
url |
http://hdl.handle.net/11336/7236 |
identifier_str_mv |
Arroyuelo, Diego; Bonacic, Carolina; Gil Costa, Graciela Verónica; Marín, Mauricio; Navarro, Gonzalo; Distributed Text Search using Suffix Arrays; Elsevier Science; Parallel Computing; 40; 9; 11-7-2014; 471-495 0167-8191 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/doi/ info:eu-repo/semantics/altIdentifier/doi/10.1016/j.parco.2014.06.007 info:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0167819114000805 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-nd/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Elsevier Science |
publisher.none.fl_str_mv |
Elsevier Science |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844614174524571648 |
score |
13.070432 |