A semi-supervised incremental algorithm to automatically formulate topical queries

Autores
Lorenzetti, Carlos Martin; Maguitman, Ana Gabriela
Año de publicación
2009
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
The quality of the material collected by a context-based Web search systems is highly dependant on the vocabulary used to generate the search queries. This paper proposes to apply a semi-supervised algorithm to incrementally learn terms that can help bridge the terminology gap existing between the user's information needs and the relevant documents' vocabulary. The learning strategy uses an incrementally-retrieved, topic-dependent selection of Web documents for term-weight reinforcement reflecting the aptness of the terms in describing and discriminating the topic of the user context. The new algorithm learns new descriptors by searching for terms that tend to occur often in relevant documents, and learns good discriminators by identifying terms that tend to occur only in the context of the given topic. The enriched vocabulary allows the formulation of search queries that are more effective than those queries generated directly using terms from the initial topic description. An evaluation on a large collection of topics using a standard and two ad-hoc performance evaluation metrics suggests that the proposed technique is superior to a baseline and other existing query reformulation techniques.
Fil: Lorenzetti, Carlos Martin. Universidad Nacional del Sur. Departamento de Ciencia e Ingeniería de la Computación. Laboratorio de Investigación y Desarrollo en Inteligencia Artificial; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina
Fil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ciencia e Ingeniería de la Computación. Laboratorio de Investigación y Desarrollo en Inteligencia Artificial; Argentina
Materia
Context
Query Formulation
Topical Queries
Web Search
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/73615

id CONICETDig_f6eb2377e27338e0a90a2d56d7bd90c9
oai_identifier_str oai:ri.conicet.gov.ar:11336/73615
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling A semi-supervised incremental algorithm to automatically formulate topical queriesLorenzetti, Carlos MartinMaguitman, Ana GabrielaContextQuery FormulationTopical QueriesWeb Searchhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1The quality of the material collected by a context-based Web search systems is highly dependant on the vocabulary used to generate the search queries. This paper proposes to apply a semi-supervised algorithm to incrementally learn terms that can help bridge the terminology gap existing between the user's information needs and the relevant documents' vocabulary. The learning strategy uses an incrementally-retrieved, topic-dependent selection of Web documents for term-weight reinforcement reflecting the aptness of the terms in describing and discriminating the topic of the user context. The new algorithm learns new descriptors by searching for terms that tend to occur often in relevant documents, and learns good discriminators by identifying terms that tend to occur only in the context of the given topic. The enriched vocabulary allows the formulation of search queries that are more effective than those queries generated directly using terms from the initial topic description. An evaluation on a large collection of topics using a standard and two ad-hoc performance evaluation metrics suggests that the proposed technique is superior to a baseline and other existing query reformulation techniques.Fil: Lorenzetti, Carlos Martin. Universidad Nacional del Sur. Departamento de Ciencia e Ingeniería de la Computación. Laboratorio de Investigación y Desarrollo en Inteligencia Artificial; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; ArgentinaFil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ciencia e Ingeniería de la Computación. Laboratorio de Investigación y Desarrollo en Inteligencia Artificial; ArgentinaElsevier Science Inc2009-05info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/73615Lorenzetti, Carlos Martin; Maguitman, Ana Gabriela; A semi-supervised incremental algorithm to automatically formulate topical queries; Elsevier Science Inc; Information Sciences; 179; 12; 5-2009; 1881-18920020-0255CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1016/j.ins.2009.01.029info:eu-repo/semantics/altIdentifier/url/https://www.sciencedirect.com/science/article/pii/S0020025509000565info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-03T09:52:30Zoai:ri.conicet.gov.ar:11336/73615instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-03 09:52:30.504CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv A semi-supervised incremental algorithm to automatically formulate topical queries
title A semi-supervised incremental algorithm to automatically formulate topical queries
spellingShingle A semi-supervised incremental algorithm to automatically formulate topical queries
Lorenzetti, Carlos Martin
Context
Query Formulation
Topical Queries
Web Search
title_short A semi-supervised incremental algorithm to automatically formulate topical queries
title_full A semi-supervised incremental algorithm to automatically formulate topical queries
title_fullStr A semi-supervised incremental algorithm to automatically formulate topical queries
title_full_unstemmed A semi-supervised incremental algorithm to automatically formulate topical queries
title_sort A semi-supervised incremental algorithm to automatically formulate topical queries
dc.creator.none.fl_str_mv Lorenzetti, Carlos Martin
Maguitman, Ana Gabriela
author Lorenzetti, Carlos Martin
author_facet Lorenzetti, Carlos Martin
Maguitman, Ana Gabriela
author_role author
author2 Maguitman, Ana Gabriela
author2_role author
dc.subject.none.fl_str_mv Context
Query Formulation
Topical Queries
Web Search
topic Context
Query Formulation
Topical Queries
Web Search
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv The quality of the material collected by a context-based Web search systems is highly dependant on the vocabulary used to generate the search queries. This paper proposes to apply a semi-supervised algorithm to incrementally learn terms that can help bridge the terminology gap existing between the user's information needs and the relevant documents' vocabulary. The learning strategy uses an incrementally-retrieved, topic-dependent selection of Web documents for term-weight reinforcement reflecting the aptness of the terms in describing and discriminating the topic of the user context. The new algorithm learns new descriptors by searching for terms that tend to occur often in relevant documents, and learns good discriminators by identifying terms that tend to occur only in the context of the given topic. The enriched vocabulary allows the formulation of search queries that are more effective than those queries generated directly using terms from the initial topic description. An evaluation on a large collection of topics using a standard and two ad-hoc performance evaluation metrics suggests that the proposed technique is superior to a baseline and other existing query reformulation techniques.
Fil: Lorenzetti, Carlos Martin. Universidad Nacional del Sur. Departamento de Ciencia e Ingeniería de la Computación. Laboratorio de Investigación y Desarrollo en Inteligencia Artificial; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina
Fil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ciencia e Ingeniería de la Computación. Laboratorio de Investigación y Desarrollo en Inteligencia Artificial; Argentina
description The quality of the material collected by a context-based Web search systems is highly dependant on the vocabulary used to generate the search queries. This paper proposes to apply a semi-supervised algorithm to incrementally learn terms that can help bridge the terminology gap existing between the user's information needs and the relevant documents' vocabulary. The learning strategy uses an incrementally-retrieved, topic-dependent selection of Web documents for term-weight reinforcement reflecting the aptness of the terms in describing and discriminating the topic of the user context. The new algorithm learns new descriptors by searching for terms that tend to occur often in relevant documents, and learns good discriminators by identifying terms that tend to occur only in the context of the given topic. The enriched vocabulary allows the formulation of search queries that are more effective than those queries generated directly using terms from the initial topic description. An evaluation on a large collection of topics using a standard and two ad-hoc performance evaluation metrics suggests that the proposed technique is superior to a baseline and other existing query reformulation techniques.
publishDate 2009
dc.date.none.fl_str_mv 2009-05
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/73615
Lorenzetti, Carlos Martin; Maguitman, Ana Gabriela; A semi-supervised incremental algorithm to automatically formulate topical queries; Elsevier Science Inc; Information Sciences; 179; 12; 5-2009; 1881-1892
0020-0255
CONICET Digital
CONICET
url http://hdl.handle.net/11336/73615
identifier_str_mv Lorenzetti, Carlos Martin; Maguitman, Ana Gabriela; A semi-supervised incremental algorithm to automatically formulate topical queries; Elsevier Science Inc; Information Sciences; 179; 12; 5-2009; 1881-1892
0020-0255
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/doi/10.1016/j.ins.2009.01.029
info:eu-repo/semantics/altIdentifier/url/https://www.sciencedirect.com/science/article/pii/S0020025509000565
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv Elsevier Science Inc
publisher.none.fl_str_mv Elsevier Science Inc
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1842269163342528512
score 13.13397