Modelling Efficient Novelty-based Search Result Diversification in Metric Spaces

Autores
Gil Costa, Graciela Verónica; Santos, Rodrygo L. T.; Macdonald, Craig; Ounis, Iadh
Año de publicación
2013
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Novelty-based diversification provides a way to tackle ambiguous queries by re-ranking a set of retrieved documents. Current approaches are typically greedy, requiring O(n2) document–document comparisons in order to diversify a ranking of n documents. In this article, we introduce a new approach for novelty-based search result diversification to reduce the overhead incurred by document–document comparisons. To this end, we model novelty promotion as a similarity search in a metric space, exploiting the properties of this space to efficiently identify novel documents. We investigate three different approaches: pivoting-based, clustering-based, and permutation-based. In the first two, a novel document is one that lies outside the range of a pivot or outside a cluster. In the latter, a novel document is one that has a different signature (i.e., the documentʼs relative distance to a distinguished set of fixed objects called permutants) compared to previously selected documents. Thorough experiments using two TREC test collections for diversity evaluation, as well as a large sample of the query stream of a commercial search engine show that our approaches perform at least as effectively as well-known novelty-based diversification approaches in the literature, while dramatically improving their efficiency.
Fil: Gil Costa, Graciela Verónica. Yahoo; México. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico San Luis; Argentina
Fil: Santos, Rodrygo L. T.. University Of Glasgow; Reino Unido
Fil: Macdonald, Craig. University Of Glasgow; Reino Unido
Fil: Ounis, Iadh. University Of Glasgow; Reino Unido
Materia
Similarity Search
Diverification
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/7075

id CONICETDig_140cf715b87d675670815af1f391c51d
oai_identifier_str oai:ri.conicet.gov.ar:11336/7075
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Modelling Efficient Novelty-based Search Result Diversification in Metric SpacesGil Costa, Graciela VerónicaSantos, Rodrygo L. T.Macdonald, CraigOunis, IadhSimilarity SearchDiverificationhttps://purl.org/becyt/ford/2.2https://purl.org/becyt/ford/2Novelty-based diversification provides a way to tackle ambiguous queries by re-ranking a set of retrieved documents. Current approaches are typically greedy, requiring O(n2) document–document comparisons in order to diversify a ranking of n documents. In this article, we introduce a new approach for novelty-based search result diversification to reduce the overhead incurred by document–document comparisons. To this end, we model novelty promotion as a similarity search in a metric space, exploiting the properties of this space to efficiently identify novel documents. We investigate three different approaches: pivoting-based, clustering-based, and permutation-based. In the first two, a novel document is one that lies outside the range of a pivot or outside a cluster. In the latter, a novel document is one that has a different signature (i.e., the documentʼs relative distance to a distinguished set of fixed objects called permutants) compared to previously selected documents. Thorough experiments using two TREC test collections for diversity evaluation, as well as a large sample of the query stream of a commercial search engine show that our approaches perform at least as effectively as well-known novelty-based diversification approaches in the literature, while dramatically improving their efficiency.Fil: Gil Costa, Graciela Verónica. Yahoo; México. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico San Luis; ArgentinaFil: Santos, Rodrygo L. T.. University Of Glasgow; Reino UnidoFil: Macdonald, Craig. University Of Glasgow; Reino UnidoFil: Ounis, Iadh. University Of Glasgow; Reino UnidoElsevier2013-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/7075Gil Costa, Graciela Verónica; Santos, Rodrygo L. T.; Macdonald, Craig; Ounis, Iadh; Modelling Efficient Novelty-based Search Result Diversification in Metric Spaces; Elsevier; Journal of Discrete Algorithms; 18; 1-2013; 75-881570-8667enginfo:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S1570866712001074info:eu-repo/semantics/altIdentifier/doi/info:eu-repo/semantics/altIdentifier/doi/10.1016/j.jda.2012.07.004info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-nd/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:05:06Zoai:ri.conicet.gov.ar:11336/7075instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:05:06.884CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Modelling Efficient Novelty-based Search Result Diversification in Metric Spaces
title Modelling Efficient Novelty-based Search Result Diversification in Metric Spaces
spellingShingle Modelling Efficient Novelty-based Search Result Diversification in Metric Spaces
Gil Costa, Graciela Verónica
Similarity Search
Diverification
title_short Modelling Efficient Novelty-based Search Result Diversification in Metric Spaces
title_full Modelling Efficient Novelty-based Search Result Diversification in Metric Spaces
title_fullStr Modelling Efficient Novelty-based Search Result Diversification in Metric Spaces
title_full_unstemmed Modelling Efficient Novelty-based Search Result Diversification in Metric Spaces
title_sort Modelling Efficient Novelty-based Search Result Diversification in Metric Spaces
dc.creator.none.fl_str_mv Gil Costa, Graciela Verónica
Santos, Rodrygo L. T.
Macdonald, Craig
Ounis, Iadh
author Gil Costa, Graciela Verónica
author_facet Gil Costa, Graciela Verónica
Santos, Rodrygo L. T.
Macdonald, Craig
Ounis, Iadh
author_role author
author2 Santos, Rodrygo L. T.
Macdonald, Craig
Ounis, Iadh
author2_role author
author
author
dc.subject.none.fl_str_mv Similarity Search
Diverification
topic Similarity Search
Diverification
purl_subject.fl_str_mv https://purl.org/becyt/ford/2.2
https://purl.org/becyt/ford/2
dc.description.none.fl_txt_mv Novelty-based diversification provides a way to tackle ambiguous queries by re-ranking a set of retrieved documents. Current approaches are typically greedy, requiring O(n2) document–document comparisons in order to diversify a ranking of n documents. In this article, we introduce a new approach for novelty-based search result diversification to reduce the overhead incurred by document–document comparisons. To this end, we model novelty promotion as a similarity search in a metric space, exploiting the properties of this space to efficiently identify novel documents. We investigate three different approaches: pivoting-based, clustering-based, and permutation-based. In the first two, a novel document is one that lies outside the range of a pivot or outside a cluster. In the latter, a novel document is one that has a different signature (i.e., the documentʼs relative distance to a distinguished set of fixed objects called permutants) compared to previously selected documents. Thorough experiments using two TREC test collections for diversity evaluation, as well as a large sample of the query stream of a commercial search engine show that our approaches perform at least as effectively as well-known novelty-based diversification approaches in the literature, while dramatically improving their efficiency.
Fil: Gil Costa, Graciela Verónica. Yahoo; México. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico San Luis; Argentina
Fil: Santos, Rodrygo L. T.. University Of Glasgow; Reino Unido
Fil: Macdonald, Craig. University Of Glasgow; Reino Unido
Fil: Ounis, Iadh. University Of Glasgow; Reino Unido
description Novelty-based diversification provides a way to tackle ambiguous queries by re-ranking a set of retrieved documents. Current approaches are typically greedy, requiring O(n2) document–document comparisons in order to diversify a ranking of n documents. In this article, we introduce a new approach for novelty-based search result diversification to reduce the overhead incurred by document–document comparisons. To this end, we model novelty promotion as a similarity search in a metric space, exploiting the properties of this space to efficiently identify novel documents. We investigate three different approaches: pivoting-based, clustering-based, and permutation-based. In the first two, a novel document is one that lies outside the range of a pivot or outside a cluster. In the latter, a novel document is one that has a different signature (i.e., the documentʼs relative distance to a distinguished set of fixed objects called permutants) compared to previously selected documents. Thorough experiments using two TREC test collections for diversity evaluation, as well as a large sample of the query stream of a commercial search engine show that our approaches perform at least as effectively as well-known novelty-based diversification approaches in the literature, while dramatically improving their efficiency.
publishDate 2013
dc.date.none.fl_str_mv 2013-01
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/7075
Gil Costa, Graciela Verónica; Santos, Rodrygo L. T.; Macdonald, Craig; Ounis, Iadh; Modelling Efficient Novelty-based Search Result Diversification in Metric Spaces; Elsevier; Journal of Discrete Algorithms; 18; 1-2013; 75-88
1570-8667
url http://hdl.handle.net/11336/7075
identifier_str_mv Gil Costa, Graciela Verónica; Santos, Rodrygo L. T.; Macdonald, Craig; Ounis, Iadh; Modelling Efficient Novelty-based Search Result Diversification in Metric Spaces; Elsevier; Journal of Discrete Algorithms; 18; 1-2013; 75-88
1570-8667
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S1570866712001074
info:eu-repo/semantics/altIdentifier/doi/
info:eu-repo/semantics/altIdentifier/doi/10.1016/j.jda.2012.07.004
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv Elsevier
publisher.none.fl_str_mv Elsevier
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844613883635957760
score 13.070432