A flexible supervised term-weighting technique and its application to variable extraction and information retrieval

Autores
Maisonnave, Mariano; Delbianco, Fernando Andrés; Tohmé, Fernando Abel; Maguitman, Ana Gabriela
Año de publicación
2019
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
Successful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the applicability of the proposed technique to address diverse problems such as building prediction models, supporting knowledge modeling, and achieving total recall.
Fil: Maisonnave, Mariano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina
Fil: Delbianco, Fernando Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Investigaciones Económicas y Sociales del Sur. Universidad Nacional del Sur. Departamento de Economía. Instituto de Investigaciones Económicas y Sociales del Sur; Argentina
Fil: Tohmé, Fernando Abel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Matemática Bahía Blanca. Universidad Nacional del Sur. Departamento de Matemática. Instituto de Matemática Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Economía; Argentina
Fil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina
Materia
INFORMATION RETRIEVAL
QUERY-TERM SELECTION
TERM WEIGHTING
VARIABLE EXTRACTION
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/92800

id CONICETDig_effb59471c1743a80a4743ed82516782
oai_identifier_str oai:ri.conicet.gov.ar:11336/92800
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling A flexible supervised term-weighting technique and its application to variable extraction and information retrievalMaisonnave, MarianoDelbianco, Fernando AndrésTohmé, Fernando AbelMaguitman, Ana GabrielaINFORMATION RETRIEVALQUERY-TERM SELECTIONTERM WEIGHTINGVARIABLE EXTRACTIONhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Successful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the applicability of the proposed technique to address diverse problems such as building prediction models, supporting knowledge modeling, and achieving total recall.Fil: Maisonnave, Mariano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; ArgentinaFil: Delbianco, Fernando Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Investigaciones Económicas y Sociales del Sur. Universidad Nacional del Sur. Departamento de Economía. Instituto de Investigaciones Económicas y Sociales del Sur; ArgentinaFil: Tohmé, Fernando Abel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Matemática Bahía Blanca. Universidad Nacional del Sur. Departamento de Matemática. Instituto de Matemática Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Economía; ArgentinaFil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; ArgentinaIberamia2019-02info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/92800Maisonnave, Mariano; Delbianco, Fernando Andrés; Tohmé, Fernando Abel; Maguitman, Ana Gabriela; A flexible supervised term-weighting technique and its application to variable extraction and information retrieval; Iberamia; Inteligencia Artificial; 22; 63; 2-2019; 61-801137-36011988-3064CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/http://journal.iberamia.org/index.php/intartif/article/view/255info:eu-repo/semantics/altIdentifier/doi/10.4114/intartif.vol22iss63pp61-80info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-03T09:57:27Zoai:ri.conicet.gov.ar:11336/92800instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-03 09:57:27.978CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv A flexible supervised term-weighting technique and its application to variable extraction and information retrieval
title A flexible supervised term-weighting technique and its application to variable extraction and information retrieval
spellingShingle A flexible supervised term-weighting technique and its application to variable extraction and information retrieval
Maisonnave, Mariano
INFORMATION RETRIEVAL
QUERY-TERM SELECTION
TERM WEIGHTING
VARIABLE EXTRACTION
title_short A flexible supervised term-weighting technique and its application to variable extraction and information retrieval
title_full A flexible supervised term-weighting technique and its application to variable extraction and information retrieval
title_fullStr A flexible supervised term-weighting technique and its application to variable extraction and information retrieval
title_full_unstemmed A flexible supervised term-weighting technique and its application to variable extraction and information retrieval
title_sort A flexible supervised term-weighting technique and its application to variable extraction and information retrieval
dc.creator.none.fl_str_mv Maisonnave, Mariano
Delbianco, Fernando Andrés
Tohmé, Fernando Abel
Maguitman, Ana Gabriela
author Maisonnave, Mariano
author_facet Maisonnave, Mariano
Delbianco, Fernando Andrés
Tohmé, Fernando Abel
Maguitman, Ana Gabriela
author_role author
author2 Delbianco, Fernando Andrés
Tohmé, Fernando Abel
Maguitman, Ana Gabriela
author2_role author
author
author
dc.subject.none.fl_str_mv INFORMATION RETRIEVAL
QUERY-TERM SELECTION
TERM WEIGHTING
VARIABLE EXTRACTION
topic INFORMATION RETRIEVAL
QUERY-TERM SELECTION
TERM WEIGHTING
VARIABLE EXTRACTION
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Successful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the applicability of the proposed technique to address diverse problems such as building prediction models, supporting knowledge modeling, and achieving total recall.
Fil: Maisonnave, Mariano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina
Fil: Delbianco, Fernando Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Investigaciones Económicas y Sociales del Sur. Universidad Nacional del Sur. Departamento de Economía. Instituto de Investigaciones Económicas y Sociales del Sur; Argentina
Fil: Tohmé, Fernando Abel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Matemática Bahía Blanca. Universidad Nacional del Sur. Departamento de Matemática. Instituto de Matemática Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Economía; Argentina
Fil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina
description Successful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the applicability of the proposed technique to address diverse problems such as building prediction models, supporting knowledge modeling, and achieving total recall.
publishDate 2019
dc.date.none.fl_str_mv 2019-02
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/92800
Maisonnave, Mariano; Delbianco, Fernando Andrés; Tohmé, Fernando Abel; Maguitman, Ana Gabriela; A flexible supervised term-weighting technique and its application to variable extraction and information retrieval; Iberamia; Inteligencia Artificial; 22; 63; 2-2019; 61-80
1137-3601
1988-3064
CONICET Digital
CONICET
url http://hdl.handle.net/11336/92800
identifier_str_mv Maisonnave, Mariano; Delbianco, Fernando Andrés; Tohmé, Fernando Abel; Maguitman, Ana Gabriela; A flexible supervised term-weighting technique and its application to variable extraction and information retrieval; Iberamia; Inteligencia Artificial; 22; 63; 2-2019; 61-80
1137-3601
1988-3064
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/http://journal.iberamia.org/index.php/intartif/article/view/255
info:eu-repo/semantics/altIdentifier/doi/10.4114/intartif.vol22iss63pp61-80
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv Iberamia
publisher.none.fl_str_mv Iberamia
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1842269462846242816
score 13.13397