A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
- Autores
- Maisonnave, Mariano; Delbianco, Fernando Andrés; Tohmé, Fernando Abel; Maguitman, Ana Gabriela
- Año de publicación
- 2018
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- Successful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the potential of the proposal as a first step for identifying different types of associations between words.
Fil: Maisonnave, Mariano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina
Fil: Delbianco, Fernando Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Matemática Bahía Blanca. Universidad Nacional del Sur. Departamento de Matemática. Instituto de Matemática Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Economía; Argentina
Fil: Tohmé, Fernando Abel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Matemática Bahía Blanca. Universidad Nacional del Sur. Departamento de Matemática. Instituto de Matemática Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Economía; Argentina
Fil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina
XIX Simposio Argentino de Inteligencia Artificial
Buenos Aires
Argentina
Universidad de Palermo. Facultad de Ingeniería. Asociación Argentina de Inteligencia Artificial. Sociedad Argentina de Informática - Materia
-
TERM WEIGHTING
VARIABLE EXTRACTION
INFORMATION RETRIEVA
QUERY TERM SELECTION - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/135484
Ver los metadatos del registro completo
id |
CONICETDig_73c258c17d6dd56efaaef04257e8903f |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/135484 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital MediaMaisonnave, MarianoDelbianco, Fernando AndrésTohmé, Fernando AbelMaguitman, Ana GabrielaTERM WEIGHTINGVARIABLE EXTRACTIONINFORMATION RETRIEVAQUERY TERM SELECTIONhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Successful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the potential of the proposal as a first step for identifying different types of associations between words.Fil: Maisonnave, Mariano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; ArgentinaFil: Delbianco, Fernando Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Matemática Bahía Blanca. Universidad Nacional del Sur. Departamento de Matemática. Instituto de Matemática Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Economía; ArgentinaFil: Tohmé, Fernando Abel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Matemática Bahía Blanca. Universidad Nacional del Sur. Departamento de Matemática. Instituto de Matemática Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Economía; ArgentinaFil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; ArgentinaXIX Simposio Argentino de Inteligencia ArtificialBuenos AiresArgentinaUniversidad de Palermo. Facultad de Ingeniería. Asociación Argentina de Inteligencia Artificial. Sociedad Argentina de InformáticaSociedad Argentina de Informática2018info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObjectSimposioJournalhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/135484A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media; XIX Simposio Argentino de Inteligencia Artificial; Buenos Aires; Argentina; 2018; 40-532451-7585CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://47jaiio.sadio.org.ar/index.php?q=asaiinfo:eu-repo/semantics/altIdentifier/url/https://47jaiio.sadio.org.ar/index.php?q=node/81info:eu-repo/semantics/altIdentifier/url/https://47jaiio.sadio.org.ar/sites/default/files/ASAI-07.pdfNacionalinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-03T09:55:45Zoai:ri.conicet.gov.ar:11336/135484instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-03 09:55:46.12CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media |
title |
A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media |
spellingShingle |
A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media Maisonnave, Mariano TERM WEIGHTING VARIABLE EXTRACTION INFORMATION RETRIEVA QUERY TERM SELECTION |
title_short |
A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media |
title_full |
A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media |
title_fullStr |
A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media |
title_full_unstemmed |
A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media |
title_sort |
A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media |
dc.creator.none.fl_str_mv |
Maisonnave, Mariano Delbianco, Fernando Andrés Tohmé, Fernando Abel Maguitman, Ana Gabriela |
author |
Maisonnave, Mariano |
author_facet |
Maisonnave, Mariano Delbianco, Fernando Andrés Tohmé, Fernando Abel Maguitman, Ana Gabriela |
author_role |
author |
author2 |
Delbianco, Fernando Andrés Tohmé, Fernando Abel Maguitman, Ana Gabriela |
author2_role |
author author author |
dc.subject.none.fl_str_mv |
TERM WEIGHTING VARIABLE EXTRACTION INFORMATION RETRIEVA QUERY TERM SELECTION |
topic |
TERM WEIGHTING VARIABLE EXTRACTION INFORMATION RETRIEVA QUERY TERM SELECTION |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
Successful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the potential of the proposal as a first step for identifying different types of associations between words. Fil: Maisonnave, Mariano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina Fil: Delbianco, Fernando Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Matemática Bahía Blanca. Universidad Nacional del Sur. Departamento de Matemática. Instituto de Matemática Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Economía; Argentina Fil: Tohmé, Fernando Abel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Matemática Bahía Blanca. Universidad Nacional del Sur. Departamento de Matemática. Instituto de Matemática Bahía Blanca; Argentina. Universidad Nacional del Sur. Departamento de Economía; Argentina Fil: Maguitman, Ana Gabriela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación; Argentina XIX Simposio Argentino de Inteligencia Artificial Buenos Aires Argentina Universidad de Palermo. Facultad de Ingeniería. Asociación Argentina de Inteligencia Artificial. Sociedad Argentina de Informática |
description |
Successful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the potential of the proposal as a first step for identifying different types of associations between words. |
publishDate |
2018 |
dc.date.none.fl_str_mv |
2018 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/publishedVersion info:eu-repo/semantics/conferenceObject Simposio Journal http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
status_str |
publishedVersion |
format |
conferenceObject |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/135484 A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media; XIX Simposio Argentino de Inteligencia Artificial; Buenos Aires; Argentina; 2018; 40-53 2451-7585 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/135484 |
identifier_str_mv |
A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media; XIX Simposio Argentino de Inteligencia Artificial; Buenos Aires; Argentina; 2018; 40-53 2451-7585 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://47jaiio.sadio.org.ar/index.php?q=asai info:eu-repo/semantics/altIdentifier/url/https://47jaiio.sadio.org.ar/index.php?q=node/81 info:eu-repo/semantics/altIdentifier/url/https://47jaiio.sadio.org.ar/sites/default/files/ASAI-07.pdf |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf application/pdf application/pdf |
dc.coverage.none.fl_str_mv |
Nacional |
dc.publisher.none.fl_str_mv |
Sociedad Argentina de Informática |
publisher.none.fl_str_mv |
Sociedad Argentina de Informática |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1842269365135736832 |
score |
13.13397 |