A semi-supervised incremental algorithm to automatically formulate topical queries
- Autores
- Lorenzetti, Carlos Martin; Maguitman, Ana Gabriela
- Año de publicación
- 2009
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- The quality of the material collected by a context-based Web search systems is highly dependant on the vocabulary used to generate the search queries. This paper proposes to apply a semi-supervised algorithm to incrementally learn terms that can help bridge the terminology gap existing between the user's information needs and the relevant documents' vocabulary. The learning strategy uses an incrementally-retrieved, topic-dependent selection of Web documents for term-weight reinforcement reflecting the aptness of the terms in describing and discriminating the topic of the user context. The new algorithm learns new descriptors by searching for terms that tend to occur often in relevant documents, and learns good discriminators by identifying terms that tend to occur only in the context of the given topic. The enriched vocabulary allows the formulation of search queries that are more effective than those queries generated directly using terms from the initial topic description. An evaluation on a large collection of topics using a standard and two ad-hoc performance evaluation metrics suggests that the proposed technique is superior to a baseline and other existing query reformulation techniques.
Fil: Lorenzetti, Carlos Martin. Universidad Nacional del Sur. Departamento de Ciencia e Ingeniería de la Computación. Laboratorio de Investigación y Desarrollo en Inteligencia Artificial; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina
Fil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ciencia e Ingeniería de la Computación. Laboratorio de Investigación y Desarrollo en Inteligencia Artificial; Argentina - Materia
-
Context
Query Formulation
Topical Queries
Web Search - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/73615
Ver los metadatos del registro completo
id |
CONICETDig_f6eb2377e27338e0a90a2d56d7bd90c9 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/73615 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
A semi-supervised incremental algorithm to automatically formulate topical queriesLorenzetti, Carlos MartinMaguitman, Ana GabrielaContextQuery FormulationTopical QueriesWeb Searchhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1The quality of the material collected by a context-based Web search systems is highly dependant on the vocabulary used to generate the search queries. This paper proposes to apply a semi-supervised algorithm to incrementally learn terms that can help bridge the terminology gap existing between the user's information needs and the relevant documents' vocabulary. The learning strategy uses an incrementally-retrieved, topic-dependent selection of Web documents for term-weight reinforcement reflecting the aptness of the terms in describing and discriminating the topic of the user context. The new algorithm learns new descriptors by searching for terms that tend to occur often in relevant documents, and learns good discriminators by identifying terms that tend to occur only in the context of the given topic. The enriched vocabulary allows the formulation of search queries that are more effective than those queries generated directly using terms from the initial topic description. An evaluation on a large collection of topics using a standard and two ad-hoc performance evaluation metrics suggests that the proposed technique is superior to a baseline and other existing query reformulation techniques.Fil: Lorenzetti, Carlos Martin. Universidad Nacional del Sur. Departamento de Ciencia e Ingeniería de la Computación. Laboratorio de Investigación y Desarrollo en Inteligencia Artificial; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; ArgentinaFil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ciencia e Ingeniería de la Computación. Laboratorio de Investigación y Desarrollo en Inteligencia Artificial; ArgentinaElsevier Science Inc2009-05info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/73615Lorenzetti, Carlos Martin; Maguitman, Ana Gabriela; A semi-supervised incremental algorithm to automatically formulate topical queries; Elsevier Science Inc; Information Sciences; 179; 12; 5-2009; 1881-18920020-0255CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/doi/10.1016/j.ins.2009.01.029info:eu-repo/semantics/altIdentifier/url/https://www.sciencedirect.com/science/article/pii/S0020025509000565info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-03T09:52:30Zoai:ri.conicet.gov.ar:11336/73615instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-03 09:52:30.504CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
A semi-supervised incremental algorithm to automatically formulate topical queries |
title |
A semi-supervised incremental algorithm to automatically formulate topical queries |
spellingShingle |
A semi-supervised incremental algorithm to automatically formulate topical queries Lorenzetti, Carlos Martin Context Query Formulation Topical Queries Web Search |
title_short |
A semi-supervised incremental algorithm to automatically formulate topical queries |
title_full |
A semi-supervised incremental algorithm to automatically formulate topical queries |
title_fullStr |
A semi-supervised incremental algorithm to automatically formulate topical queries |
title_full_unstemmed |
A semi-supervised incremental algorithm to automatically formulate topical queries |
title_sort |
A semi-supervised incremental algorithm to automatically formulate topical queries |
dc.creator.none.fl_str_mv |
Lorenzetti, Carlos Martin Maguitman, Ana Gabriela |
author |
Lorenzetti, Carlos Martin |
author_facet |
Lorenzetti, Carlos Martin Maguitman, Ana Gabriela |
author_role |
author |
author2 |
Maguitman, Ana Gabriela |
author2_role |
author |
dc.subject.none.fl_str_mv |
Context Query Formulation Topical Queries Web Search |
topic |
Context Query Formulation Topical Queries Web Search |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
The quality of the material collected by a context-based Web search systems is highly dependant on the vocabulary used to generate the search queries. This paper proposes to apply a semi-supervised algorithm to incrementally learn terms that can help bridge the terminology gap existing between the user's information needs and the relevant documents' vocabulary. The learning strategy uses an incrementally-retrieved, topic-dependent selection of Web documents for term-weight reinforcement reflecting the aptness of the terms in describing and discriminating the topic of the user context. The new algorithm learns new descriptors by searching for terms that tend to occur often in relevant documents, and learns good discriminators by identifying terms that tend to occur only in the context of the given topic. The enriched vocabulary allows the formulation of search queries that are more effective than those queries generated directly using terms from the initial topic description. An evaluation on a large collection of topics using a standard and two ad-hoc performance evaluation metrics suggests that the proposed technique is superior to a baseline and other existing query reformulation techniques. Fil: Lorenzetti, Carlos Martin. Universidad Nacional del Sur. Departamento de Ciencia e Ingeniería de la Computación. Laboratorio de Investigación y Desarrollo en Inteligencia Artificial; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina Fil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ciencia e Ingeniería de la Computación. Laboratorio de Investigación y Desarrollo en Inteligencia Artificial; Argentina |
description |
The quality of the material collected by a context-based Web search systems is highly dependant on the vocabulary used to generate the search queries. This paper proposes to apply a semi-supervised algorithm to incrementally learn terms that can help bridge the terminology gap existing between the user's information needs and the relevant documents' vocabulary. The learning strategy uses an incrementally-retrieved, topic-dependent selection of Web documents for term-weight reinforcement reflecting the aptness of the terms in describing and discriminating the topic of the user context. The new algorithm learns new descriptors by searching for terms that tend to occur often in relevant documents, and learns good discriminators by identifying terms that tend to occur only in the context of the given topic. The enriched vocabulary allows the formulation of search queries that are more effective than those queries generated directly using terms from the initial topic description. An evaluation on a large collection of topics using a standard and two ad-hoc performance evaluation metrics suggests that the proposed technique is superior to a baseline and other existing query reformulation techniques. |
publishDate |
2009 |
dc.date.none.fl_str_mv |
2009-05 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/73615 Lorenzetti, Carlos Martin; Maguitman, Ana Gabriela; A semi-supervised incremental algorithm to automatically formulate topical queries; Elsevier Science Inc; Information Sciences; 179; 12; 5-2009; 1881-1892 0020-0255 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/73615 |
identifier_str_mv |
Lorenzetti, Carlos Martin; Maguitman, Ana Gabriela; A semi-supervised incremental algorithm to automatically formulate topical queries; Elsevier Science Inc; Information Sciences; 179; 12; 5-2009; 1881-1892 0020-0255 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/doi/10.1016/j.ins.2009.01.029 info:eu-repo/semantics/altIdentifier/url/https://www.sciencedirect.com/science/article/pii/S0020025509000565 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Elsevier Science Inc |
publisher.none.fl_str_mv |
Elsevier Science Inc |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1842269163342528512 |
score |
13.13397 |