Using Big Data Analysis to Improve Cache Performance in Search Engines
- Autores
- Tolosa, Gabriel Hernán; Feuerstein, Esteban
- Año de publicación
- 2015
- Idioma
- español castellano
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- Web Search Engines process huge amounts of data to support search but must run under strong performance requirements (to answer a query in a fraction of a second). To meet that performance they implement different optimization techniques such as caching, that may be implemented at several levels. One of these caching levels is the intersection cache, that attempts to exploit frequently occurring pairs of terms by keeping in the memory of the search node the results of intersecting the corresponding inverted lists. In this work we propose an optimization step to decide which items should be cached and which not by introducing the usage of data mining techniques. Our preliminary results show that it is possible to achieve extra cost savings in this already hyper-optimized field.
Sociedad Argentina de Informática e Investigación Operativa (SADIO) - Materia
-
Ciencias Informáticas
big data
Web Search Engines (WSE)
intersection caching
Search process - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-sa/3.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/51952
Ver los metadatos del registro completo
id |
SEDICI_22746514526f0262d97e12244b41e126 |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/51952 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
Using Big Data Analysis to Improve Cache Performance in Search EnginesTolosa, Gabriel HernánFeuerstein, EstebanCiencias Informáticasbig dataWeb Search Engines (WSE)intersection cachingSearch processWeb Search Engines process huge amounts of data to support search but must run under strong performance requirements (to answer a query in a fraction of a second). To meet that performance they implement different optimization techniques such as caching, that may be implemented at several levels. One of these caching levels is the intersection cache, that attempts to exploit frequently occurring pairs of terms by keeping in the memory of the search node the results of intersecting the corresponding inverted lists. In this work we propose an optimization step to decide which items should be cached and which not by introducing the usage of data mining techniques. Our preliminary results show that it is possible to achieve extra cost savings in this already hyper-optimized field.Sociedad Argentina de Informática e Investigación Operativa (SADIO)2015-09info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf7-10http://sedici.unlp.edu.ar/handle/10915/51952spainfo:eu-repo/semantics/altIdentifier/url/http://44jaiio.sadio.org.ar/sites/default/files/agranda7-10.pdfinfo:eu-repo/semantics/altIdentifier/issn/2451-7569info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-sa/3.0/Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-10-15T10:57:00Zoai:sedici.unlp.edu.ar:10915/51952Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-10-15 10:57:01.213SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
Using Big Data Analysis to Improve Cache Performance in Search Engines |
title |
Using Big Data Analysis to Improve Cache Performance in Search Engines |
spellingShingle |
Using Big Data Analysis to Improve Cache Performance in Search Engines Tolosa, Gabriel Hernán Ciencias Informáticas big data Web Search Engines (WSE) intersection caching Search process |
title_short |
Using Big Data Analysis to Improve Cache Performance in Search Engines |
title_full |
Using Big Data Analysis to Improve Cache Performance in Search Engines |
title_fullStr |
Using Big Data Analysis to Improve Cache Performance in Search Engines |
title_full_unstemmed |
Using Big Data Analysis to Improve Cache Performance in Search Engines |
title_sort |
Using Big Data Analysis to Improve Cache Performance in Search Engines |
dc.creator.none.fl_str_mv |
Tolosa, Gabriel Hernán Feuerstein, Esteban |
author |
Tolosa, Gabriel Hernán |
author_facet |
Tolosa, Gabriel Hernán Feuerstein, Esteban |
author_role |
author |
author2 |
Feuerstein, Esteban |
author2_role |
author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas big data Web Search Engines (WSE) intersection caching Search process |
topic |
Ciencias Informáticas big data Web Search Engines (WSE) intersection caching Search process |
dc.description.none.fl_txt_mv |
Web Search Engines process huge amounts of data to support search but must run under strong performance requirements (to answer a query in a fraction of a second). To meet that performance they implement different optimization techniques such as caching, that may be implemented at several levels. One of these caching levels is the intersection cache, that attempts to exploit frequently occurring pairs of terms by keeping in the memory of the search node the results of intersecting the corresponding inverted lists. In this work we propose an optimization step to decide which items should be cached and which not by introducing the usage of data mining techniques. Our preliminary results show that it is possible to achieve extra cost savings in this already hyper-optimized field. Sociedad Argentina de Informática e Investigación Operativa (SADIO) |
description |
Web Search Engines process huge amounts of data to support search but must run under strong performance requirements (to answer a query in a fraction of a second). To meet that performance they implement different optimization techniques such as caching, that may be implemented at several levels. One of these caching levels is the intersection cache, that attempts to exploit frequently occurring pairs of terms by keeping in the memory of the search node the results of intersecting the corresponding inverted lists. In this work we propose an optimization step to decide which items should be cached and which not by introducing the usage of data mining techniques. Our preliminary results show that it is possible to achieve extra cost savings in this already hyper-optimized field. |
publishDate |
2015 |
dc.date.none.fl_str_mv |
2015-09 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/51952 |
url |
http://sedici.unlp.edu.ar/handle/10915/51952 |
dc.language.none.fl_str_mv |
spa |
language |
spa |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/http://44jaiio.sadio.org.ar/sites/default/files/agranda7-10.pdf info:eu-repo/semantics/altIdentifier/issn/2451-7569 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) |
dc.format.none.fl_str_mv |
application/pdf 7-10 |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1846064014691401728 |
score |
13.22299 |