Composite Retrieval of Diverse and Complementary Bundles

Autores
Amer Yahia, Sihem; Bonchi, Francesco; Castillo, Carlos; Feuerstein, Esteban Zindel; Méndez-Díaz, Isabel; Zabala, Paula Lorena
Año de publicación
2014
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Users are often faced with the problem of finding complementary items that together achieve a single common goal (e.g., a starter kit for a novice astronomer, a collection of question/answers related to low-carb nutrition, a set of places to visit on holidays). In this paper, we argue that for some application scenarios returning item bundles is more appropriate than ranked lists. Thus we define composite retrieval as the problem of finding k bundles of complementary items. Beyond complementarity of items, the bundles must be valid w.r.t. a given budget, and the answer set of k bundles must exhibit diversity. We formally define the problem and show that in its general form is NP-hard and that also the special cases in which each bundle is formed by only one item, or only one bundle is sought, are hard. Our characterization however suggests how to adopt a two-phase approach (Produce-and-Choose, or PAC) in which we first produce many valid bundles, and then we choose k among them. For the first phase we devise two ad-hoc clustering algorithms, while for the second phase we adapt heuristics with approximation guarantees for a related problem. We also devise another approach which is based on first finding a k-clustering and then selecting a valid bundle from each of the produced clusters (Cluster-and-Pick, or CAP). We compare experimentally the proposed methods on two real-world data sets: the first data set is given by a sample of touristic attractions in 10 large European cities, while the second is a large database of user-generated restaurant reviews from Yahoo! Local. Our experiments show that when diversity is highly important, CAP is the best option, while when diversity is less important, a PAC approach constructing bundles around randomly chosen pivots, is better.
Fil: Amer Yahia, Sihem. Centre National de la Recherche Scientifique; Francia
Fil: Bonchi, Francesco. Yahoo Labs; España
Fil: Castillo, Carlos. Qatar Computing Research Institute; Qatar
Fil: Feuerstein, Esteban Zindel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Fil: Méndez-Díaz, Isabel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Fil: Zabala, Paula Lorena. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Materia
Composite Retrieval
Complementarity
Diversity
Maximun Edge Subgraph
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/33093

id CONICETDig_24bbdc5879e551f72c531a7c98f07e84
oai_identifier_str oai:ri.conicet.gov.ar:11336/33093
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Composite Retrieval of Diverse and Complementary BundlesAmer Yahia, SihemBonchi, FrancescoCastillo, CarlosFeuerstein, Esteban ZindelMéndez-Díaz, IsabelZabala, Paula LorenaComposite RetrievalComplementarityDiversityMaximun Edge Subgraphhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Users are often faced with the problem of finding complementary items that together achieve a single common goal (e.g., a starter kit for a novice astronomer, a collection of question/answers related to low-carb nutrition, a set of places to visit on holidays). In this paper, we argue that for some application scenarios returning item bundles is more appropriate than ranked lists. Thus we define composite retrieval as the problem of finding k bundles of complementary items. Beyond complementarity of items, the bundles must be valid w.r.t. a given budget, and the answer set of k bundles must exhibit diversity. We formally define the problem and show that in its general form is NP-hard and that also the special cases in which each bundle is formed by only one item, or only one bundle is sought, are hard. Our characterization however suggests how to adopt a two-phase approach (Produce-and-Choose, or PAC) in which we first produce many valid bundles, and then we choose k among them. For the first phase we devise two ad-hoc clustering algorithms, while for the second phase we adapt heuristics with approximation guarantees for a related problem. We also devise another approach which is based on first finding a k-clustering and then selecting a valid bundle from each of the produced clusters (Cluster-and-Pick, or CAP). We compare experimentally the proposed methods on two real-world data sets: the first data set is given by a sample of touristic attractions in 10 large European cities, while the second is a large database of user-generated restaurant reviews from Yahoo! Local. Our experiments show that when diversity is highly important, CAP is the best option, while when diversity is less important, a PAC approach constructing bundles around randomly chosen pivots, is better.Fil: Amer Yahia, Sihem. Centre National de la Recherche Scientifique; FranciaFil: Bonchi, Francesco. Yahoo Labs; EspañaFil: Castillo, Carlos. Qatar Computing Research Institute; QatarFil: Feuerstein, Esteban Zindel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Méndez-Díaz, Isabel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Zabala, Paula Lorena. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaIEEE Computer Society2014-11info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/33093Bonchi, Francesco; Castillo, Carlos; Zabala, Paula Lorena; Amer Yahia, Sihem; Feuerstein, Esteban Zindel; Méndez-Díaz, Isabel; et al.; Composite Retrieval of Diverse and Complementary Bundles; IEEE Computer Society; Ieee Transactions On Knowledge And Data Engineering; 26; 11; 11-2014; 2662-26751041-4347CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1109/TKDE.2014.2306678info:eu-repo/semantics/altIdentifier/url/http://ieeexplore.ieee.org/document/6742606/info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-10-15T14:37:05Zoai:ri.conicet.gov.ar:11336/33093instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-10-15 14:37:06.124CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Composite Retrieval of Diverse and Complementary Bundles
title Composite Retrieval of Diverse and Complementary Bundles
spellingShingle Composite Retrieval of Diverse and Complementary Bundles
Amer Yahia, Sihem
Composite Retrieval
Complementarity
Diversity
Maximun Edge Subgraph
title_short Composite Retrieval of Diverse and Complementary Bundles
title_full Composite Retrieval of Diverse and Complementary Bundles
title_fullStr Composite Retrieval of Diverse and Complementary Bundles
title_full_unstemmed Composite Retrieval of Diverse and Complementary Bundles
title_sort Composite Retrieval of Diverse and Complementary Bundles
dc.creator.none.fl_str_mv Amer Yahia, Sihem
Bonchi, Francesco
Castillo, Carlos
Feuerstein, Esteban Zindel
Méndez-Díaz, Isabel
Zabala, Paula Lorena
author Amer Yahia, Sihem
author_facet Amer Yahia, Sihem
Bonchi, Francesco
Castillo, Carlos
Feuerstein, Esteban Zindel
Méndez-Díaz, Isabel
Zabala, Paula Lorena
author_role author
author2 Bonchi, Francesco
Castillo, Carlos
Feuerstein, Esteban Zindel
Méndez-Díaz, Isabel
Zabala, Paula Lorena
author2_role author
author
author
author
author
dc.subject.none.fl_str_mv Composite Retrieval
Complementarity
Diversity
Maximun Edge Subgraph
topic Composite Retrieval
Complementarity
Diversity
Maximun Edge Subgraph
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Users are often faced with the problem of finding complementary items that together achieve a single common goal (e.g., a starter kit for a novice astronomer, a collection of question/answers related to low-carb nutrition, a set of places to visit on holidays). In this paper, we argue that for some application scenarios returning item bundles is more appropriate than ranked lists. Thus we define composite retrieval as the problem of finding k bundles of complementary items. Beyond complementarity of items, the bundles must be valid w.r.t. a given budget, and the answer set of k bundles must exhibit diversity. We formally define the problem and show that in its general form is NP-hard and that also the special cases in which each bundle is formed by only one item, or only one bundle is sought, are hard. Our characterization however suggests how to adopt a two-phase approach (Produce-and-Choose, or PAC) in which we first produce many valid bundles, and then we choose k among them. For the first phase we devise two ad-hoc clustering algorithms, while for the second phase we adapt heuristics with approximation guarantees for a related problem. We also devise another approach which is based on first finding a k-clustering and then selecting a valid bundle from each of the produced clusters (Cluster-and-Pick, or CAP). We compare experimentally the proposed methods on two real-world data sets: the first data set is given by a sample of touristic attractions in 10 large European cities, while the second is a large database of user-generated restaurant reviews from Yahoo! Local. Our experiments show that when diversity is highly important, CAP is the best option, while when diversity is less important, a PAC approach constructing bundles around randomly chosen pivots, is better.
Fil: Amer Yahia, Sihem. Centre National de la Recherche Scientifique; Francia
Fil: Bonchi, Francesco. Yahoo Labs; España
Fil: Castillo, Carlos. Qatar Computing Research Institute; Qatar
Fil: Feuerstein, Esteban Zindel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Fil: Méndez-Díaz, Isabel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Fil: Zabala, Paula Lorena. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
description Users are often faced with the problem of finding complementary items that together achieve a single common goal (e.g., a starter kit for a novice astronomer, a collection of question/answers related to low-carb nutrition, a set of places to visit on holidays). In this paper, we argue that for some application scenarios returning item bundles is more appropriate than ranked lists. Thus we define composite retrieval as the problem of finding k bundles of complementary items. Beyond complementarity of items, the bundles must be valid w.r.t. a given budget, and the answer set of k bundles must exhibit diversity. We formally define the problem and show that in its general form is NP-hard and that also the special cases in which each bundle is formed by only one item, or only one bundle is sought, are hard. Our characterization however suggests how to adopt a two-phase approach (Produce-and-Choose, or PAC) in which we first produce many valid bundles, and then we choose k among them. For the first phase we devise two ad-hoc clustering algorithms, while for the second phase we adapt heuristics with approximation guarantees for a related problem. We also devise another approach which is based on first finding a k-clustering and then selecting a valid bundle from each of the produced clusters (Cluster-and-Pick, or CAP). We compare experimentally the proposed methods on two real-world data sets: the first data set is given by a sample of touristic attractions in 10 large European cities, while the second is a large database of user-generated restaurant reviews from Yahoo! Local. Our experiments show that when diversity is highly important, CAP is the best option, while when diversity is less important, a PAC approach constructing bundles around randomly chosen pivots, is better.
publishDate 2014
dc.date.none.fl_str_mv 2014-11
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/33093
Bonchi, Francesco; Castillo, Carlos; Zabala, Paula Lorena; Amer Yahia, Sihem; Feuerstein, Esteban Zindel; Méndez-Díaz, Isabel; et al.; Composite Retrieval of Diverse and Complementary Bundles; IEEE Computer Society; Ieee Transactions On Knowledge And Data Engineering; 26; 11; 11-2014; 2662-2675
1041-4347
CONICET Digital
CONICET
url http://hdl.handle.net/11336/33093
identifier_str_mv Bonchi, Francesco; Castillo, Carlos; Zabala, Paula Lorena; Amer Yahia, Sihem; Feuerstein, Esteban Zindel; Méndez-Díaz, Isabel; et al.; Composite Retrieval of Diverse and Complementary Bundles; IEEE Computer Society; Ieee Transactions On Knowledge And Data Engineering; 26; 11; 11-2014; 2662-2675
1041-4347
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/10.1109/TKDE.2014.2306678
info:eu-repo/semantics/altIdentifier/url/http://ieeexplore.ieee.org/document/6742606/
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv IEEE Computer Society
publisher.none.fl_str_mv IEEE Computer Society
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1846082841202393088
score 13.22299