Using Big Data Analysis to Improve Cache Performance in Search Engines

Autores
Tolosa, Gabriel Hernán; Feuerstein, Esteban
Año de publicación
2015
Idioma
español castellano
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Web Search Engines process huge amounts of data to support search but must run under strong performance requirements (to answer a query in a fraction of a second). To meet that performance they implement different optimization techniques such as caching, that may be implemented at several levels. One of these caching levels is the intersection cache, that attempts to exploit frequently occurring pairs of terms by keeping in the memory of the search node the results of intersecting the corresponding inverted lists. In this work we propose an optimization step to decide which items should be cached and which not by introducing the usage of data mining techniques. Our preliminary results show that it is possible to achieve extra cost savings in this already hyper-optimized field.
Sociedad Argentina de Informática e Investigación Operativa (SADIO)
Materia
Ciencias Informáticas
big data
Web Search Engines (WSE)
intersection caching
Search process
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-sa/3.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/51952

id SEDICI_22746514526f0262d97e12244b41e126
oai_identifier_str oai:sedici.unlp.edu.ar:10915/51952
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Using Big Data Analysis to Improve Cache Performance in Search EnginesTolosa, Gabriel HernánFeuerstein, EstebanCiencias Informáticasbig dataWeb Search Engines (WSE)intersection cachingSearch processWeb Search Engines process huge amounts of data to support search but must run under strong performance requirements (to answer a query in a fraction of a second). To meet that performance they implement different optimization techniques such as caching, that may be implemented at several levels. One of these caching levels is the intersection cache, that attempts to exploit frequently occurring pairs of terms by keeping in the memory of the search node the results of intersecting the corresponding inverted lists. In this work we propose an optimization step to decide which items should be cached and which not by introducing the usage of data mining techniques. Our preliminary results show that it is possible to achieve extra cost savings in this already hyper-optimized field.Sociedad Argentina de Informática e Investigación Operativa (SADIO)2015-09info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf7-10http://sedici.unlp.edu.ar/handle/10915/51952spainfo:eu-repo/semantics/altIdentifier/url/http://44jaiio.sadio.org.ar/sites/default/files/agranda7-10.pdfinfo:eu-repo/semantics/altIdentifier/issn/2451-7569info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-sa/3.0/Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-10-15T10:57:00Zoai:sedici.unlp.edu.ar:10915/51952Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-10-15 10:57:01.213SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Using Big Data Analysis to Improve Cache Performance in Search Engines
title Using Big Data Analysis to Improve Cache Performance in Search Engines
spellingShingle Using Big Data Analysis to Improve Cache Performance in Search Engines
Tolosa, Gabriel Hernán
Ciencias Informáticas
big data
Web Search Engines (WSE)
intersection caching
Search process
title_short Using Big Data Analysis to Improve Cache Performance in Search Engines
title_full Using Big Data Analysis to Improve Cache Performance in Search Engines
title_fullStr Using Big Data Analysis to Improve Cache Performance in Search Engines
title_full_unstemmed Using Big Data Analysis to Improve Cache Performance in Search Engines
title_sort Using Big Data Analysis to Improve Cache Performance in Search Engines
dc.creator.none.fl_str_mv Tolosa, Gabriel Hernán
Feuerstein, Esteban
author Tolosa, Gabriel Hernán
author_facet Tolosa, Gabriel Hernán
Feuerstein, Esteban
author_role author
author2 Feuerstein, Esteban
author2_role author
dc.subject.none.fl_str_mv Ciencias Informáticas
big data
Web Search Engines (WSE)
intersection caching
Search process
topic Ciencias Informáticas
big data
Web Search Engines (WSE)
intersection caching
Search process
dc.description.none.fl_txt_mv Web Search Engines process huge amounts of data to support search but must run under strong performance requirements (to answer a query in a fraction of a second). To meet that performance they implement different optimization techniques such as caching, that may be implemented at several levels. One of these caching levels is the intersection cache, that attempts to exploit frequently occurring pairs of terms by keeping in the memory of the search node the results of intersecting the corresponding inverted lists. In this work we propose an optimization step to decide which items should be cached and which not by introducing the usage of data mining techniques. Our preliminary results show that it is possible to achieve extra cost savings in this already hyper-optimized field.
Sociedad Argentina de Informática e Investigación Operativa (SADIO)
description Web Search Engines process huge amounts of data to support search but must run under strong performance requirements (to answer a query in a fraction of a second). To meet that performance they implement different optimization techniques such as caching, that may be implemented at several levels. One of these caching levels is the intersection cache, that attempts to exploit frequently occurring pairs of terms by keeping in the memory of the search node the results of intersecting the corresponding inverted lists. In this work we propose an optimization step to decide which items should be cached and which not by introducing the usage of data mining techniques. Our preliminary results show that it is possible to achieve extra cost savings in this already hyper-optimized field.
publishDate 2015
dc.date.none.fl_str_mv 2015-09
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/51952
url http://sedici.unlp.edu.ar/handle/10915/51952
dc.language.none.fl_str_mv spa
language spa
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/http://44jaiio.sadio.org.ar/sites/default/files/agranda7-10.pdf
info:eu-repo/semantics/altIdentifier/issn/2451-7569
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-sa/3.0/
Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-sa/3.0/
Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
dc.format.none.fl_str_mv application/pdf
7-10
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1846064014691401728
score 13.22299