Multi-objective genetic programming strategies for topic-based search with a focus on diversity and global recall

Autores
Baggio, Cecilia; Lorenzetti, Carlos Martin; Cecchini, Rocío Luján; Maguitman, Ana Gabriela
Año de publicación
2023
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Topic-based search systems retrieve items by contextualizing the information seeking process on a topic of interest to the user. A key issue in topic-based search of text resources is how to automatically generate multiple queries that reflect the topic of interest in such a way that precision, recall, and diversity are achieved. The problem of generating topic-based queries can be effectively addressed by Multi-Objective Evolutionary Algorithms, which have shown promising results. However, two common problems with such an approach are loss of diversity and low global recall when combining results from multiple queries. This work proposes a family of Multi-Objective Genetic Programming strategies based on objective functions that attempt to maximize precision and recall while minimizing the similarity among the retrieved results. To this end, we define three novel objective functions based on result set similarity and on the information theoretic notion of entropy. Extensive experiments allow us to conclude that while the proposed strategies significantly improve precision after a few generations, only some of them are able to maintain or improve global recall. A comparative analysis against previous strategies based on Multi-Objective Evolutionary Algorithms, indicates that the proposed approach is superior in terms of precision and global recall. Furthermore, when compared to query-term-selection methods based on existing state-of-the-art term-weighting schemes, the presented Multi-Objective Genetic Programming strategies demonstrate significantly higher levels of precision, recall, and F1-score, while maintaining competitive global recall. Finally, we identify the strengths and limitations of the strategies and conclude that the choice of objectives to be maximized or minimized should be guided by the application at hand.
Fil: Baggio, Cecilia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina
Fil: Lorenzetti, Carlos Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina
Fil: Cecchini, Rocío Luján. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina
Fil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina
Materia
AUTOMATIC QUERY FORMULATION
DIVERSITY MAXIMIZATION
DIVERSITY PRESERVATION
GLOBAL RECALL
INFORMATION RETRIEVAL
INFORMATION-THEORETIC FITNESS FUNCTIONS
LEARNING COMPLEX QUERIES
MULTI-OBJECTIVE GENETIC PROGRAMMING
TOPIC-BASED SEARCH
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/225085

id CONICETDig_6e710ed63b9230e60e72ea5cce8615ec
oai_identifier_str oai:ri.conicet.gov.ar:11336/225085
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Multi-objective genetic programming strategies for topic-based search with a focus on diversity and global recallBaggio, CeciliaLorenzetti, Carlos MartinCecchini, Rocío LujánMaguitman, Ana GabrielaAUTOMATIC QUERY FORMULATIONDIVERSITY MAXIMIZATIONDIVERSITY PRESERVATIONGLOBAL RECALLINFORMATION RETRIEVALINFORMATION-THEORETIC FITNESS FUNCTIONSLEARNING COMPLEX QUERIESMULTI-OBJECTIVE GENETIC PROGRAMMINGTOPIC-BASED SEARCHhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Topic-based search systems retrieve items by contextualizing the information seeking process on a topic of interest to the user. A key issue in topic-based search of text resources is how to automatically generate multiple queries that reflect the topic of interest in such a way that precision, recall, and diversity are achieved. The problem of generating topic-based queries can be effectively addressed by Multi-Objective Evolutionary Algorithms, which have shown promising results. However, two common problems with such an approach are loss of diversity and low global recall when combining results from multiple queries. This work proposes a family of Multi-Objective Genetic Programming strategies based on objective functions that attempt to maximize precision and recall while minimizing the similarity among the retrieved results. To this end, we define three novel objective functions based on result set similarity and on the information theoretic notion of entropy. Extensive experiments allow us to conclude that while the proposed strategies significantly improve precision after a few generations, only some of them are able to maintain or improve global recall. A comparative analysis against previous strategies based on Multi-Objective Evolutionary Algorithms, indicates that the proposed approach is superior in terms of precision and global recall. Furthermore, when compared to query-term-selection methods based on existing state-of-the-art term-weighting schemes, the presented Multi-Objective Genetic Programming strategies demonstrate significantly higher levels of precision, recall, and F1-score, while maintaining competitive global recall. Finally, we identify the strengths and limitations of the strategies and conclude that the choice of objectives to be maximized or minimized should be guided by the application at hand.Fil: Baggio, Cecilia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; ArgentinaFil: Lorenzetti, Carlos Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; ArgentinaFil: Cecchini, Rocío Luján. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; ArgentinaFil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; ArgentinaPeerJ Inc.2023-11info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/225085Baggio, Cecilia; Lorenzetti, Carlos Martin; Cecchini, Rocío Luján; Maguitman, Ana Gabriela; Multi-objective genetic programming strategies for topic-based search with a focus on diversity and global recall; PeerJ Inc.; PeerJ Computer Science; 9; 11-2023; 1-392376-5992CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://peerj.com/articles/cs-1710info:eu-repo/semantics/altIdentifier/doi/10.7717/peerj-cs.1710info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-10-15T15:42:09Zoai:ri.conicet.gov.ar:11336/225085instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-10-15 15:42:10.15CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Multi-objective genetic programming strategies for topic-based search with a focus on diversity and global recall
title Multi-objective genetic programming strategies for topic-based search with a focus on diversity and global recall
spellingShingle Multi-objective genetic programming strategies for topic-based search with a focus on diversity and global recall
Baggio, Cecilia
AUTOMATIC QUERY FORMULATION
DIVERSITY MAXIMIZATION
DIVERSITY PRESERVATION
GLOBAL RECALL
INFORMATION RETRIEVAL
INFORMATION-THEORETIC FITNESS FUNCTIONS
LEARNING COMPLEX QUERIES
MULTI-OBJECTIVE GENETIC PROGRAMMING
TOPIC-BASED SEARCH
title_short Multi-objective genetic programming strategies for topic-based search with a focus on diversity and global recall
title_full Multi-objective genetic programming strategies for topic-based search with a focus on diversity and global recall
title_fullStr Multi-objective genetic programming strategies for topic-based search with a focus on diversity and global recall
title_full_unstemmed Multi-objective genetic programming strategies for topic-based search with a focus on diversity and global recall
title_sort Multi-objective genetic programming strategies for topic-based search with a focus on diversity and global recall
dc.creator.none.fl_str_mv Baggio, Cecilia
Lorenzetti, Carlos Martin
Cecchini, Rocío Luján
Maguitman, Ana Gabriela
author Baggio, Cecilia
author_facet Baggio, Cecilia
Lorenzetti, Carlos Martin
Cecchini, Rocío Luján
Maguitman, Ana Gabriela
author_role author
author2 Lorenzetti, Carlos Martin
Cecchini, Rocío Luján
Maguitman, Ana Gabriela
author2_role author
author
author
dc.subject.none.fl_str_mv AUTOMATIC QUERY FORMULATION
DIVERSITY MAXIMIZATION
DIVERSITY PRESERVATION
GLOBAL RECALL
INFORMATION RETRIEVAL
INFORMATION-THEORETIC FITNESS FUNCTIONS
LEARNING COMPLEX QUERIES
MULTI-OBJECTIVE GENETIC PROGRAMMING
TOPIC-BASED SEARCH
topic AUTOMATIC QUERY FORMULATION
DIVERSITY MAXIMIZATION
DIVERSITY PRESERVATION
GLOBAL RECALL
INFORMATION RETRIEVAL
INFORMATION-THEORETIC FITNESS FUNCTIONS
LEARNING COMPLEX QUERIES
MULTI-OBJECTIVE GENETIC PROGRAMMING
TOPIC-BASED SEARCH
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Topic-based search systems retrieve items by contextualizing the information seeking process on a topic of interest to the user. A key issue in topic-based search of text resources is how to automatically generate multiple queries that reflect the topic of interest in such a way that precision, recall, and diversity are achieved. The problem of generating topic-based queries can be effectively addressed by Multi-Objective Evolutionary Algorithms, which have shown promising results. However, two common problems with such an approach are loss of diversity and low global recall when combining results from multiple queries. This work proposes a family of Multi-Objective Genetic Programming strategies based on objective functions that attempt to maximize precision and recall while minimizing the similarity among the retrieved results. To this end, we define three novel objective functions based on result set similarity and on the information theoretic notion of entropy. Extensive experiments allow us to conclude that while the proposed strategies significantly improve precision after a few generations, only some of them are able to maintain or improve global recall. A comparative analysis against previous strategies based on Multi-Objective Evolutionary Algorithms, indicates that the proposed approach is superior in terms of precision and global recall. Furthermore, when compared to query-term-selection methods based on existing state-of-the-art term-weighting schemes, the presented Multi-Objective Genetic Programming strategies demonstrate significantly higher levels of precision, recall, and F1-score, while maintaining competitive global recall. Finally, we identify the strengths and limitations of the strategies and conclude that the choice of objectives to be maximized or minimized should be guided by the application at hand.
Fil: Baggio, Cecilia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina
Fil: Lorenzetti, Carlos Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina
Fil: Cecchini, Rocío Luján. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina
Fil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina
description Topic-based search systems retrieve items by contextualizing the information seeking process on a topic of interest to the user. A key issue in topic-based search of text resources is how to automatically generate multiple queries that reflect the topic of interest in such a way that precision, recall, and diversity are achieved. The problem of generating topic-based queries can be effectively addressed by Multi-Objective Evolutionary Algorithms, which have shown promising results. However, two common problems with such an approach are loss of diversity and low global recall when combining results from multiple queries. This work proposes a family of Multi-Objective Genetic Programming strategies based on objective functions that attempt to maximize precision and recall while minimizing the similarity among the retrieved results. To this end, we define three novel objective functions based on result set similarity and on the information theoretic notion of entropy. Extensive experiments allow us to conclude that while the proposed strategies significantly improve precision after a few generations, only some of them are able to maintain or improve global recall. A comparative analysis against previous strategies based on Multi-Objective Evolutionary Algorithms, indicates that the proposed approach is superior in terms of precision and global recall. Furthermore, when compared to query-term-selection methods based on existing state-of-the-art term-weighting schemes, the presented Multi-Objective Genetic Programming strategies demonstrate significantly higher levels of precision, recall, and F1-score, while maintaining competitive global recall. Finally, we identify the strengths and limitations of the strategies and conclude that the choice of objectives to be maximized or minimized should be guided by the application at hand.
publishDate 2023
dc.date.none.fl_str_mv 2023-11
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/225085
Baggio, Cecilia; Lorenzetti, Carlos Martin; Cecchini, Rocío Luján; Maguitman, Ana Gabriela; Multi-objective genetic programming strategies for topic-based search with a focus on diversity and global recall; PeerJ Inc.; PeerJ Computer Science; 9; 11-2023; 1-39
2376-5992
CONICET Digital
CONICET
url http://hdl.handle.net/11336/225085
identifier_str_mv Baggio, Cecilia; Lorenzetti, Carlos Martin; Cecchini, Rocío Luján; Maguitman, Ana Gabriela; Multi-objective genetic programming strategies for topic-based search with a focus on diversity and global recall; PeerJ Inc.; PeerJ Computer Science; 9; 11-2023; 1-39
2376-5992
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://peerj.com/articles/cs-1710
info:eu-repo/semantics/altIdentifier/doi/10.7717/peerj-cs.1710
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv PeerJ Inc.
publisher.none.fl_str_mv PeerJ Inc.
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1846083530669424640
score 13.22299