Improving Real Time Search Performance using Inverted Index Entries Invalidation Strategies

Autores
Ríssola, Esteban A.; Tolosa, Gabriel Hernán
Año de publicación
2016
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
The impressive rise of user-generated content on the web in the hands of sites like Twitter imposes new challenges to search systems. The concept of real-time search emerges, increasing the role that efficient indexing and retrieval algorithms play in this scenario. Thousands of new updates need to be processed in the very moment they are generated and users expect content to be “searchable” within seconds. This lead to the develop of efficient data structures and algorithms that may face this challenge efficiently. In this work, we introduce the concept of index entry invalidator, a strategy responsible for keeping track of the evolution of the underlying vocabulary and selectively invalidate and evict those inverted index entries that do not considerably degrade retrieval effectiveness. Consequently, the index becomes smaller and may increase overall efficiency. We introduce and evaluate two approaches based on Time-to-Live and Sliding Windows criteria. We also study the dynamics of the vocabulary using a real dataset while the evaluation is carry out using a search engine specifically designed for real-time indexing and search.
Facultad de Informática
Materia
Ciencias Informáticas
Real time
Indexing methods
Information Search and Retrieval
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by/3.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/52377

id SEDICI_11ae6bf0e855ab60b1cd2314b9b619b1
oai_identifier_str oai:sedici.unlp.edu.ar:10915/52377
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Improving Real Time Search Performance using Inverted Index Entries Invalidation StrategiesRíssola, Esteban A.Tolosa, Gabriel HernánCiencias InformáticasReal timeIndexing methodsInformation Search and RetrievalThe impressive rise of user-generated content on the web in the hands of sites like Twitter imposes new challenges to search systems. The concept of real-time search emerges, increasing the role that efficient indexing and retrieval algorithms play in this scenario. Thousands of new updates need to be processed in the very moment they are generated and users expect content to be “searchable” within seconds. This lead to the develop of efficient data structures and algorithms that may face this challenge efficiently. In this work, we introduce the concept of index entry invalidator, a strategy responsible for keeping track of the evolution of the underlying vocabulary and selectively invalidate and evict those inverted index entries that do not considerably degrade retrieval effectiveness. Consequently, the index becomes smaller and may increase overall efficiency. We introduce and evaluate two approaches based on Time-to-Live and Sliding Windows criteria. We also study the dynamics of the vocabulary using a real dataset while the evaluation is carry out using a search engine specifically designed for real-time indexing and search.Facultad de Informática2016-04info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionArticulohttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdf6-13http://sedici.unlp.edu.ar/handle/10915/52377enginfo:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/wp-content/uploads/2015/10/JCST-42-Paper-2.pdfinfo:eu-repo/semantics/altIdentifier/issn/1666-6038info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by/3.0/Creative Commons Attribution 3.0 Unported (CC BY 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T11:04:39Zoai:sedici.unlp.edu.ar:10915/52377Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 11:04:40.08SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Improving Real Time Search Performance using Inverted Index Entries Invalidation Strategies
title Improving Real Time Search Performance using Inverted Index Entries Invalidation Strategies
spellingShingle Improving Real Time Search Performance using Inverted Index Entries Invalidation Strategies
Ríssola, Esteban A.
Ciencias Informáticas
Real time
Indexing methods
Information Search and Retrieval
title_short Improving Real Time Search Performance using Inverted Index Entries Invalidation Strategies
title_full Improving Real Time Search Performance using Inverted Index Entries Invalidation Strategies
title_fullStr Improving Real Time Search Performance using Inverted Index Entries Invalidation Strategies
title_full_unstemmed Improving Real Time Search Performance using Inverted Index Entries Invalidation Strategies
title_sort Improving Real Time Search Performance using Inverted Index Entries Invalidation Strategies
dc.creator.none.fl_str_mv Ríssola, Esteban A.
Tolosa, Gabriel Hernán
author Ríssola, Esteban A.
author_facet Ríssola, Esteban A.
Tolosa, Gabriel Hernán
author_role author
author2 Tolosa, Gabriel Hernán
author2_role author
dc.subject.none.fl_str_mv Ciencias Informáticas
Real time
Indexing methods
Information Search and Retrieval
topic Ciencias Informáticas
Real time
Indexing methods
Information Search and Retrieval
dc.description.none.fl_txt_mv The impressive rise of user-generated content on the web in the hands of sites like Twitter imposes new challenges to search systems. The concept of real-time search emerges, increasing the role that efficient indexing and retrieval algorithms play in this scenario. Thousands of new updates need to be processed in the very moment they are generated and users expect content to be “searchable” within seconds. This lead to the develop of efficient data structures and algorithms that may face this challenge efficiently. In this work, we introduce the concept of index entry invalidator, a strategy responsible for keeping track of the evolution of the underlying vocabulary and selectively invalidate and evict those inverted index entries that do not considerably degrade retrieval effectiveness. Consequently, the index becomes smaller and may increase overall efficiency. We introduce and evaluate two approaches based on Time-to-Live and Sliding Windows criteria. We also study the dynamics of the vocabulary using a real dataset while the evaluation is carry out using a search engine specifically designed for real-time indexing and search.
Facultad de Informática
description The impressive rise of user-generated content on the web in the hands of sites like Twitter imposes new challenges to search systems. The concept of real-time search emerges, increasing the role that efficient indexing and retrieval algorithms play in this scenario. Thousands of new updates need to be processed in the very moment they are generated and users expect content to be “searchable” within seconds. This lead to the develop of efficient data structures and algorithms that may face this challenge efficiently. In this work, we introduce the concept of index entry invalidator, a strategy responsible for keeping track of the evolution of the underlying vocabulary and selectively invalidate and evict those inverted index entries that do not considerably degrade retrieval effectiveness. Consequently, the index becomes smaller and may increase overall efficiency. We introduce and evaluate two approaches based on Time-to-Live and Sliding Windows criteria. We also study the dynamics of the vocabulary using a real dataset while the evaluation is carry out using a search engine specifically designed for real-time indexing and search.
publishDate 2016
dc.date.none.fl_str_mv 2016-04
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
Articulo
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/52377
url http://sedici.unlp.edu.ar/handle/10915/52377
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/wp-content/uploads/2015/10/JCST-42-Paper-2.pdf
info:eu-repo/semantics/altIdentifier/issn/1666-6038
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by/3.0/
Creative Commons Attribution 3.0 Unported (CC BY 3.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by/3.0/
Creative Commons Attribution 3.0 Unported (CC BY 3.0)
dc.format.none.fl_str_mv application/pdf
6-13
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1844615915454332928
score 13.070432