Reducing hardware hit by queries in web search engines

Autores
Mendoza, Marcelo; Marin, Mauricio; Gil Costa, Graciela Verónica; Ferrarotti, Flavio
Año de publicación
2016
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
In this paper, we introduce a new collection selection strategy to be operated in search engines with document partitioned indexes. Our method involves the selection of those document partitions that are most likely to deliver the best results to the formulated queries, reducing the number of queries that are submitted to each partition. This method employs learning algorithms that are capable of ranking the partitions, maximizing the probability of recovering documents with high gain. The method operates by building vector representations of each partition on the term space that is spanned by the queries. The proposed method is able to generalize to new queries and elaborate document lists with high precision for queries not considered during the training phase. To update the representations of each partition, our method employs incremental learning strategies. Beginning with an inversion test of the partition lists, we identify queries that contribute with new information and add them to the training phase. The experimental results show that our collection selection method favorably compares with state-of-the-art methods. In addition our method achieves a suitable performance with low parameter sensitivity making it applicable to search engines with hundreds of partitions.
Fil: Mendoza, Marcelo. Universidad Técnica Federico Santa María; Chile
Fil: Marin, Mauricio. Universidad de Santiago de Chile; Chile
Fil: Gil Costa, Graciela Verónica. Universidad Nacional de San Luis; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Ferrarotti, Flavio. Software Competence Center Hagenberg; Austria
Materia
Distributed Information Retrieval
Incremental Learning
Query Routing
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/60466

id CONICETDig_cf0e49ee97fb9310f16709a1c4948672
oai_identifier_str oai:ri.conicet.gov.ar:11336/60466
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Reducing hardware hit by queries in web search enginesMendoza, MarceloMarin, MauricioGil Costa, Graciela VerónicaFerrarotti, FlavioDistributed Information RetrievalIncremental LearningQuery Routinghttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1In this paper, we introduce a new collection selection strategy to be operated in search engines with document partitioned indexes. Our method involves the selection of those document partitions that are most likely to deliver the best results to the formulated queries, reducing the number of queries that are submitted to each partition. This method employs learning algorithms that are capable of ranking the partitions, maximizing the probability of recovering documents with high gain. The method operates by building vector representations of each partition on the term space that is spanned by the queries. The proposed method is able to generalize to new queries and elaborate document lists with high precision for queries not considered during the training phase. To update the representations of each partition, our method employs incremental learning strategies. Beginning with an inversion test of the partition lists, we identify queries that contribute with new information and add them to the training phase. The experimental results show that our collection selection method favorably compares with state-of-the-art methods. In addition our method achieves a suitable performance with low parameter sensitivity making it applicable to search engines with hundreds of partitions.Fil: Mendoza, Marcelo. Universidad Técnica Federico Santa María; ChileFil: Marin, Mauricio. Universidad de Santiago de Chile; ChileFil: Gil Costa, Graciela Verónica. Universidad Nacional de San Luis; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Ferrarotti, Flavio. Software Competence Center Hagenberg; AustriaPergamon-Elsevier Science Ltd2016-11info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/60466Mendoza, Marcelo; Marin, Mauricio; Gil Costa, Graciela Verónica; Ferrarotti, Flavio; Reducing hardware hit by queries in web search engines; Pergamon-Elsevier Science Ltd; Information Processing & Management; 52; 6; 11-2016; 1031-10520306-4573CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1016/j.ipm.2016.04.008info:eu-repo/semantics/altIdentifier/url/https://www.sciencedirect.com/science/article/pii/S0306457316300899info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-nd/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-10T13:06:17Zoai:ri.conicet.gov.ar:11336/60466instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-10 13:06:17.663CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Reducing hardware hit by queries in web search engines
title Reducing hardware hit by queries in web search engines
spellingShingle Reducing hardware hit by queries in web search engines
Mendoza, Marcelo
Distributed Information Retrieval
Incremental Learning
Query Routing
title_short Reducing hardware hit by queries in web search engines
title_full Reducing hardware hit by queries in web search engines
title_fullStr Reducing hardware hit by queries in web search engines
title_full_unstemmed Reducing hardware hit by queries in web search engines
title_sort Reducing hardware hit by queries in web search engines
dc.creator.none.fl_str_mv Mendoza, Marcelo
Marin, Mauricio
Gil Costa, Graciela Verónica
Ferrarotti, Flavio
author Mendoza, Marcelo
author_facet Mendoza, Marcelo
Marin, Mauricio
Gil Costa, Graciela Verónica
Ferrarotti, Flavio
author_role author
author2 Marin, Mauricio
Gil Costa, Graciela Verónica
Ferrarotti, Flavio
author2_role author
author
author
dc.subject.none.fl_str_mv Distributed Information Retrieval
Incremental Learning
Query Routing
topic Distributed Information Retrieval
Incremental Learning
Query Routing
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv In this paper, we introduce a new collection selection strategy to be operated in search engines with document partitioned indexes. Our method involves the selection of those document partitions that are most likely to deliver the best results to the formulated queries, reducing the number of queries that are submitted to each partition. This method employs learning algorithms that are capable of ranking the partitions, maximizing the probability of recovering documents with high gain. The method operates by building vector representations of each partition on the term space that is spanned by the queries. The proposed method is able to generalize to new queries and elaborate document lists with high precision for queries not considered during the training phase. To update the representations of each partition, our method employs incremental learning strategies. Beginning with an inversion test of the partition lists, we identify queries that contribute with new information and add them to the training phase. The experimental results show that our collection selection method favorably compares with state-of-the-art methods. In addition our method achieves a suitable performance with low parameter sensitivity making it applicable to search engines with hundreds of partitions.
Fil: Mendoza, Marcelo. Universidad Técnica Federico Santa María; Chile
Fil: Marin, Mauricio. Universidad de Santiago de Chile; Chile
Fil: Gil Costa, Graciela Verónica. Universidad Nacional de San Luis; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Ferrarotti, Flavio. Software Competence Center Hagenberg; Austria
description In this paper, we introduce a new collection selection strategy to be operated in search engines with document partitioned indexes. Our method involves the selection of those document partitions that are most likely to deliver the best results to the formulated queries, reducing the number of queries that are submitted to each partition. This method employs learning algorithms that are capable of ranking the partitions, maximizing the probability of recovering documents with high gain. The method operates by building vector representations of each partition on the term space that is spanned by the queries. The proposed method is able to generalize to new queries and elaborate document lists with high precision for queries not considered during the training phase. To update the representations of each partition, our method employs incremental learning strategies. Beginning with an inversion test of the partition lists, we identify queries that contribute with new information and add them to the training phase. The experimental results show that our collection selection method favorably compares with state-of-the-art methods. In addition our method achieves a suitable performance with low parameter sensitivity making it applicable to search engines with hundreds of partitions.
publishDate 2016
dc.date.none.fl_str_mv 2016-11
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/60466
Mendoza, Marcelo; Marin, Mauricio; Gil Costa, Graciela Verónica; Ferrarotti, Flavio; Reducing hardware hit by queries in web search engines; Pergamon-Elsevier Science Ltd; Information Processing & Management; 52; 6; 11-2016; 1031-1052
0306-4573
CONICET Digital
CONICET
url http://hdl.handle.net/11336/60466
identifier_str_mv Mendoza, Marcelo; Marin, Mauricio; Gil Costa, Graciela Verónica; Ferrarotti, Flavio; Reducing hardware hit by queries in web search engines; Pergamon-Elsevier Science Ltd; Information Processing & Management; 52; 6; 11-2016; 1031-1052
0306-4573
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/10.1016/j.ipm.2016.04.008
info:eu-repo/semantics/altIdentifier/url/https://www.sciencedirect.com/science/article/pii/S0306457316300899
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Pergamon-Elsevier Science Ltd
publisher.none.fl_str_mv Pergamon-Elsevier Science Ltd
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1842980256606060544
score 12.993085