Reducing hardware hit by queries in web search engines
- Autores
- Mendoza, Marcelo; Marin, Mauricio; Gil Costa, Graciela Verónica; Ferrarotti, Flavio
- Año de publicación
- 2016
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- In this paper, we introduce a new collection selection strategy to be operated in search engines with document partitioned indexes. Our method involves the selection of those document partitions that are most likely to deliver the best results to the formulated queries, reducing the number of queries that are submitted to each partition. This method employs learning algorithms that are capable of ranking the partitions, maximizing the probability of recovering documents with high gain. The method operates by building vector representations of each partition on the term space that is spanned by the queries. The proposed method is able to generalize to new queries and elaborate document lists with high precision for queries not considered during the training phase. To update the representations of each partition, our method employs incremental learning strategies. Beginning with an inversion test of the partition lists, we identify queries that contribute with new information and add them to the training phase. The experimental results show that our collection selection method favorably compares with state-of-the-art methods. In addition our method achieves a suitable performance with low parameter sensitivity making it applicable to search engines with hundreds of partitions.
Fil: Mendoza, Marcelo. Universidad Técnica Federico Santa María; Chile
Fil: Marin, Mauricio. Universidad de Santiago de Chile; Chile
Fil: Gil Costa, Graciela Verónica. Universidad Nacional de San Luis; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Ferrarotti, Flavio. Software Competence Center Hagenberg; Austria - Materia
-
Distributed Information Retrieval
Incremental Learning
Query Routing - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/60466
Ver los metadatos del registro completo
id |
CONICETDig_cf0e49ee97fb9310f16709a1c4948672 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/60466 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Reducing hardware hit by queries in web search enginesMendoza, MarceloMarin, MauricioGil Costa, Graciela VerónicaFerrarotti, FlavioDistributed Information RetrievalIncremental LearningQuery Routinghttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1In this paper, we introduce a new collection selection strategy to be operated in search engines with document partitioned indexes. Our method involves the selection of those document partitions that are most likely to deliver the best results to the formulated queries, reducing the number of queries that are submitted to each partition. This method employs learning algorithms that are capable of ranking the partitions, maximizing the probability of recovering documents with high gain. The method operates by building vector representations of each partition on the term space that is spanned by the queries. The proposed method is able to generalize to new queries and elaborate document lists with high precision for queries not considered during the training phase. To update the representations of each partition, our method employs incremental learning strategies. Beginning with an inversion test of the partition lists, we identify queries that contribute with new information and add them to the training phase. The experimental results show that our collection selection method favorably compares with state-of-the-art methods. In addition our method achieves a suitable performance with low parameter sensitivity making it applicable to search engines with hundreds of partitions.Fil: Mendoza, Marcelo. Universidad Técnica Federico Santa María; ChileFil: Marin, Mauricio. Universidad de Santiago de Chile; ChileFil: Gil Costa, Graciela Verónica. Universidad Nacional de San Luis; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Ferrarotti, Flavio. Software Competence Center Hagenberg; AustriaPergamon-Elsevier Science Ltd2016-11info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/60466Mendoza, Marcelo; Marin, Mauricio; Gil Costa, Graciela Verónica; Ferrarotti, Flavio; Reducing hardware hit by queries in web search engines; Pergamon-Elsevier Science Ltd; Information Processing & Management; 52; 6; 11-2016; 1031-10520306-4573CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1016/j.ipm.2016.04.008info:eu-repo/semantics/altIdentifier/url/https://www.sciencedirect.com/science/article/pii/S0306457316300899info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-nd/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-10T13:06:17Zoai:ri.conicet.gov.ar:11336/60466instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-10 13:06:17.663CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Reducing hardware hit by queries in web search engines |
title |
Reducing hardware hit by queries in web search engines |
spellingShingle |
Reducing hardware hit by queries in web search engines Mendoza, Marcelo Distributed Information Retrieval Incremental Learning Query Routing |
title_short |
Reducing hardware hit by queries in web search engines |
title_full |
Reducing hardware hit by queries in web search engines |
title_fullStr |
Reducing hardware hit by queries in web search engines |
title_full_unstemmed |
Reducing hardware hit by queries in web search engines |
title_sort |
Reducing hardware hit by queries in web search engines |
dc.creator.none.fl_str_mv |
Mendoza, Marcelo Marin, Mauricio Gil Costa, Graciela Verónica Ferrarotti, Flavio |
author |
Mendoza, Marcelo |
author_facet |
Mendoza, Marcelo Marin, Mauricio Gil Costa, Graciela Verónica Ferrarotti, Flavio |
author_role |
author |
author2 |
Marin, Mauricio Gil Costa, Graciela Verónica Ferrarotti, Flavio |
author2_role |
author author author |
dc.subject.none.fl_str_mv |
Distributed Information Retrieval Incremental Learning Query Routing |
topic |
Distributed Information Retrieval Incremental Learning Query Routing |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
In this paper, we introduce a new collection selection strategy to be operated in search engines with document partitioned indexes. Our method involves the selection of those document partitions that are most likely to deliver the best results to the formulated queries, reducing the number of queries that are submitted to each partition. This method employs learning algorithms that are capable of ranking the partitions, maximizing the probability of recovering documents with high gain. The method operates by building vector representations of each partition on the term space that is spanned by the queries. The proposed method is able to generalize to new queries and elaborate document lists with high precision for queries not considered during the training phase. To update the representations of each partition, our method employs incremental learning strategies. Beginning with an inversion test of the partition lists, we identify queries that contribute with new information and add them to the training phase. The experimental results show that our collection selection method favorably compares with state-of-the-art methods. In addition our method achieves a suitable performance with low parameter sensitivity making it applicable to search engines with hundreds of partitions. Fil: Mendoza, Marcelo. Universidad Técnica Federico Santa María; Chile Fil: Marin, Mauricio. Universidad de Santiago de Chile; Chile Fil: Gil Costa, Graciela Verónica. Universidad Nacional de San Luis; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina Fil: Ferrarotti, Flavio. Software Competence Center Hagenberg; Austria |
description |
In this paper, we introduce a new collection selection strategy to be operated in search engines with document partitioned indexes. Our method involves the selection of those document partitions that are most likely to deliver the best results to the formulated queries, reducing the number of queries that are submitted to each partition. This method employs learning algorithms that are capable of ranking the partitions, maximizing the probability of recovering documents with high gain. The method operates by building vector representations of each partition on the term space that is spanned by the queries. The proposed method is able to generalize to new queries and elaborate document lists with high precision for queries not considered during the training phase. To update the representations of each partition, our method employs incremental learning strategies. Beginning with an inversion test of the partition lists, we identify queries that contribute with new information and add them to the training phase. The experimental results show that our collection selection method favorably compares with state-of-the-art methods. In addition our method achieves a suitable performance with low parameter sensitivity making it applicable to search engines with hundreds of partitions. |
publishDate |
2016 |
dc.date.none.fl_str_mv |
2016-11 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/60466 Mendoza, Marcelo; Marin, Mauricio; Gil Costa, Graciela Verónica; Ferrarotti, Flavio; Reducing hardware hit by queries in web search engines; Pergamon-Elsevier Science Ltd; Information Processing & Management; 52; 6; 11-2016; 1031-1052 0306-4573 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/60466 |
identifier_str_mv |
Mendoza, Marcelo; Marin, Mauricio; Gil Costa, Graciela Verónica; Ferrarotti, Flavio; Reducing hardware hit by queries in web search engines; Pergamon-Elsevier Science Ltd; Information Processing & Management; 52; 6; 11-2016; 1031-1052 0306-4573 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/doi/10.1016/j.ipm.2016.04.008 info:eu-repo/semantics/altIdentifier/url/https://www.sciencedirect.com/science/article/pii/S0306457316300899 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-nd/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Pergamon-Elsevier Science Ltd |
publisher.none.fl_str_mv |
Pergamon-Elsevier Science Ltd |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1842980256606060544 |
score |
12.993085 |