Improving Real Time Search Performance using Inverted Index Entries Invalidation Strategies
- Autores
- Ríssola, Esteban A.; Tolosa, Gabriel Hernán
- Año de publicación
- 2016
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- The impressive rise of user-generated content on the web in the hands of sites like Twitter imposes new challenges to search systems. The concept of real-time search emerges, increasing the role that efficient indexing and retrieval algorithms play in this scenario. Thousands of new updates need to be processed in the very moment they are generated and users expect content to be “searchable” within seconds. This lead to the develop of efficient data structures and algorithms that may face this challenge efficiently. In this work, we introduce the concept of index entry invalidator, a strategy responsible for keeping track of the evolution of the underlying vocabulary and selectively invalidate and evict those inverted index entries that do not considerably degrade retrieval effectiveness. Consequently, the index becomes smaller and may increase overall efficiency. We introduce and evaluate two approaches based on Time-to-Live and Sliding Windows criteria. We also study the dynamics of the vocabulary using a real dataset while the evaluation is carry out using a search engine specifically designed for real-time indexing and search.
Facultad de Informática - Materia
-
Ciencias Informáticas
Real time
Indexing methods
Information Search and Retrieval - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by/3.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/52377
Ver los metadatos del registro completo
id |
SEDICI_11ae6bf0e855ab60b1cd2314b9b619b1 |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/52377 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
Improving Real Time Search Performance using Inverted Index Entries Invalidation StrategiesRíssola, Esteban A.Tolosa, Gabriel HernánCiencias InformáticasReal timeIndexing methodsInformation Search and RetrievalThe impressive rise of user-generated content on the web in the hands of sites like Twitter imposes new challenges to search systems. The concept of real-time search emerges, increasing the role that efficient indexing and retrieval algorithms play in this scenario. Thousands of new updates need to be processed in the very moment they are generated and users expect content to be “searchable” within seconds. This lead to the develop of efficient data structures and algorithms that may face this challenge efficiently. In this work, we introduce the concept of index entry invalidator, a strategy responsible for keeping track of the evolution of the underlying vocabulary and selectively invalidate and evict those inverted index entries that do not considerably degrade retrieval effectiveness. Consequently, the index becomes smaller and may increase overall efficiency. We introduce and evaluate two approaches based on Time-to-Live and Sliding Windows criteria. We also study the dynamics of the vocabulary using a real dataset while the evaluation is carry out using a search engine specifically designed for real-time indexing and search.Facultad de Informática2016-04info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionArticulohttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdf6-13http://sedici.unlp.edu.ar/handle/10915/52377enginfo:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/wp-content/uploads/2015/10/JCST-42-Paper-2.pdfinfo:eu-repo/semantics/altIdentifier/issn/1666-6038info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by/3.0/Creative Commons Attribution 3.0 Unported (CC BY 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T11:04:39Zoai:sedici.unlp.edu.ar:10915/52377Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 11:04:40.08SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
Improving Real Time Search Performance using Inverted Index Entries Invalidation Strategies |
title |
Improving Real Time Search Performance using Inverted Index Entries Invalidation Strategies |
spellingShingle |
Improving Real Time Search Performance using Inverted Index Entries Invalidation Strategies Ríssola, Esteban A. Ciencias Informáticas Real time Indexing methods Information Search and Retrieval |
title_short |
Improving Real Time Search Performance using Inverted Index Entries Invalidation Strategies |
title_full |
Improving Real Time Search Performance using Inverted Index Entries Invalidation Strategies |
title_fullStr |
Improving Real Time Search Performance using Inverted Index Entries Invalidation Strategies |
title_full_unstemmed |
Improving Real Time Search Performance using Inverted Index Entries Invalidation Strategies |
title_sort |
Improving Real Time Search Performance using Inverted Index Entries Invalidation Strategies |
dc.creator.none.fl_str_mv |
Ríssola, Esteban A. Tolosa, Gabriel Hernán |
author |
Ríssola, Esteban A. |
author_facet |
Ríssola, Esteban A. Tolosa, Gabriel Hernán |
author_role |
author |
author2 |
Tolosa, Gabriel Hernán |
author2_role |
author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas Real time Indexing methods Information Search and Retrieval |
topic |
Ciencias Informáticas Real time Indexing methods Information Search and Retrieval |
dc.description.none.fl_txt_mv |
The impressive rise of user-generated content on the web in the hands of sites like Twitter imposes new challenges to search systems. The concept of real-time search emerges, increasing the role that efficient indexing and retrieval algorithms play in this scenario. Thousands of new updates need to be processed in the very moment they are generated and users expect content to be “searchable” within seconds. This lead to the develop of efficient data structures and algorithms that may face this challenge efficiently. In this work, we introduce the concept of index entry invalidator, a strategy responsible for keeping track of the evolution of the underlying vocabulary and selectively invalidate and evict those inverted index entries that do not considerably degrade retrieval effectiveness. Consequently, the index becomes smaller and may increase overall efficiency. We introduce and evaluate two approaches based on Time-to-Live and Sliding Windows criteria. We also study the dynamics of the vocabulary using a real dataset while the evaluation is carry out using a search engine specifically designed for real-time indexing and search. Facultad de Informática |
description |
The impressive rise of user-generated content on the web in the hands of sites like Twitter imposes new challenges to search systems. The concept of real-time search emerges, increasing the role that efficient indexing and retrieval algorithms play in this scenario. Thousands of new updates need to be processed in the very moment they are generated and users expect content to be “searchable” within seconds. This lead to the develop of efficient data structures and algorithms that may face this challenge efficiently. In this work, we introduce the concept of index entry invalidator, a strategy responsible for keeping track of the evolution of the underlying vocabulary and selectively invalidate and evict those inverted index entries that do not considerably degrade retrieval effectiveness. Consequently, the index becomes smaller and may increase overall efficiency. We introduce and evaluate two approaches based on Time-to-Live and Sliding Windows criteria. We also study the dynamics of the vocabulary using a real dataset while the evaluation is carry out using a search engine specifically designed for real-time indexing and search. |
publishDate |
2016 |
dc.date.none.fl_str_mv |
2016-04 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion Articulo http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/52377 |
url |
http://sedici.unlp.edu.ar/handle/10915/52377 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/wp-content/uploads/2015/10/JCST-42-Paper-2.pdf info:eu-repo/semantics/altIdentifier/issn/1666-6038 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/3.0/ Creative Commons Attribution 3.0 Unported (CC BY 3.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by/3.0/ Creative Commons Attribution 3.0 Unported (CC BY 3.0) |
dc.format.none.fl_str_mv |
application/pdf 6-13 |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1844615915454332928 |
score |
13.070432 |