A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media

Autores
Maisonnave, Mariano; Delbianco, Fernando Andrés; Tohmé, Fernando Abel; Maguitman, Ana Gabriela
Año de publicación
2018
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Successful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the potential of the proposal as a first step for identifying different types of associations between words.
Fil: Maisonnave, Mariano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina
Fil: Delbianco, Fernando Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Matemática Bahía Blanca. Universidad Nacional del Sur. Departamento de Matemática. Instituto de Matemática Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Economía; Argentina
Fil: Tohmé, Fernando Abel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Matemática Bahía Blanca. Universidad Nacional del Sur. Departamento de Matemática. Instituto de Matemática Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Economía; Argentina
Fil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina
XIX Simposio Argentino de Inteligencia Artificial
Buenos Aires
Argentina
Universidad de Palermo. Facultad de Ingeniería. Asociación Argentina de Inteligencia Artificial. Sociedad Argentina de Informática
Materia
TERM WEIGHTING
VARIABLE EXTRACTION
INFORMATION RETRIEVA
QUERY TERM SELECTION
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/135484

id CONICETDig_73c258c17d6dd56efaaef04257e8903f
oai_identifier_str oai:ri.conicet.gov.ar:11336/135484
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital MediaMaisonnave, MarianoDelbianco, Fernando AndrésTohmé, Fernando AbelMaguitman, Ana GabrielaTERM WEIGHTINGVARIABLE EXTRACTIONINFORMATION RETRIEVAQUERY TERM SELECTIONhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Successful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the potential of the proposal as a first step for identifying different types of associations between words.Fil: Maisonnave, Mariano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; ArgentinaFil: Delbianco, Fernando Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Matemática Bahía Blanca. Universidad Nacional del Sur. Departamento de Matemática. Instituto de Matemática Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Economía; ArgentinaFil: Tohmé, Fernando Abel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Matemática Bahía Blanca. Universidad Nacional del Sur. Departamento de Matemática. Instituto de Matemática Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Economía; ArgentinaFil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; ArgentinaXIX Simposio Argentino de Inteligencia ArtificialBuenos AiresArgentinaUniversidad de Palermo. Facultad de Ingeniería. Asociación Argentina de Inteligencia Artificial. Sociedad Argentina de InformáticaSociedad Argentina de Informática2018info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObjectSimposioJournalhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/135484A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media; XIX Simposio Argentino de Inteligencia Artificial; Buenos Aires; Argentina; 2018; 40-532451-7585CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://47jaiio.sadio.org.ar/index.php?q=asaiinfo:eu-repo/semantics/altIdentifier/url/https://47jaiio.sadio.org.ar/index.php?q=node/81info:eu-repo/semantics/altIdentifier/url/https://47jaiio.sadio.org.ar/sites/default/files/ASAI-07.pdfNacionalinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-03T09:55:45Zoai:ri.conicet.gov.ar:11336/135484instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-03 09:55:46.12CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
title A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
spellingShingle A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
Maisonnave, Mariano
TERM WEIGHTING
VARIABLE EXTRACTION
INFORMATION RETRIEVA
QUERY TERM SELECTION
title_short A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
title_full A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
title_fullStr A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
title_full_unstemmed A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
title_sort A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
dc.creator.none.fl_str_mv Maisonnave, Mariano
Delbianco, Fernando Andrés
Tohmé, Fernando Abel
Maguitman, Ana Gabriela
author Maisonnave, Mariano
author_facet Maisonnave, Mariano
Delbianco, Fernando Andrés
Tohmé, Fernando Abel
Maguitman, Ana Gabriela
author_role author
author2 Delbianco, Fernando Andrés
Tohmé, Fernando Abel
Maguitman, Ana Gabriela
author2_role author
author
author
dc.subject.none.fl_str_mv TERM WEIGHTING
VARIABLE EXTRACTION
INFORMATION RETRIEVA
QUERY TERM SELECTION
topic TERM WEIGHTING
VARIABLE EXTRACTION
INFORMATION RETRIEVA
QUERY TERM SELECTION
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Successful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the potential of the proposal as a first step for identifying different types of associations between words.
Fil: Maisonnave, Mariano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina
Fil: Delbianco, Fernando Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Matemática Bahía Blanca. Universidad Nacional del Sur. Departamento de Matemática. Instituto de Matemática Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Economía; Argentina
Fil: Tohmé, Fernando Abel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Matemática Bahía Blanca. Universidad Nacional del Sur. Departamento de Matemática. Instituto de Matemática Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Economía; Argentina
Fil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina
XIX Simposio Argentino de Inteligencia Artificial
Buenos Aires
Argentina
Universidad de Palermo. Facultad de Ingeniería. Asociación Argentina de Inteligencia Artificial. Sociedad Argentina de Informática
description Successful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the potential of the proposal as a first step for identifying different types of associations between words.
publishDate 2018
dc.date.none.fl_str_mv 2018
dc.type.none.fl_str_mv info:eu-repo/semantics/publishedVersion
info:eu-repo/semantics/conferenceObject
Simposio
Journal
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
status_str publishedVersion
format conferenceObject
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/135484
A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media; XIX Simposio Argentino de Inteligencia Artificial; Buenos Aires; Argentina; 2018; 40-53
2451-7585
CONICET Digital
CONICET
url http://hdl.handle.net/11336/135484
identifier_str_mv A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media; XIX Simposio Argentino de Inteligencia Artificial; Buenos Aires; Argentina; 2018; 40-53
2451-7585
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://47jaiio.sadio.org.ar/index.php?q=asai
info:eu-repo/semantics/altIdentifier/url/https://47jaiio.sadio.org.ar/index.php?q=node/81
info:eu-repo/semantics/altIdentifier/url/https://47jaiio.sadio.org.ar/sites/default/files/ASAI-07.pdf
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
application/pdf
dc.coverage.none.fl_str_mv Nacional
dc.publisher.none.fl_str_mv Sociedad Argentina de Informática
publisher.none.fl_str_mv Sociedad Argentina de Informática
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1842269365135736832
score 13.13397