A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media

Autores
Maisonnave, Mariano; Delbianco, Fernando; Tohmé, Fernando Abel; Maguitman, Ana Gabriela
Año de publicación
2018
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Successful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the potential of the proposal as a first step for identifying different types of associations between words.
Sociedad Argentina de Informática e Investigación Operativa
Materia
Ciencias Informáticas
termweighting
variable extraction
information retrieval
query- term selection
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-sa/3.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/70694

id SEDICI_3533652910db326a73917319ff10f1c3
oai_identifier_str oai:sedici.unlp.edu.ar:10915/70694
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital MediaMaisonnave, MarianoDelbianco, FernandoTohmé, Fernando AbelMaguitman, Ana GabrielaCiencias Informáticastermweightingvariable extractioninformation retrievalquery- term selectionSuccessful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the potential of the proposal as a first step for identifying different types of associations between words.Sociedad Argentina de Informática e Investigación Operativa2018-09info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf40-53http://sedici.unlp.edu.ar/handle/10915/70694enginfo:eu-repo/semantics/altIdentifier/url/http://47jaiio.sadio.org.ar/sites/default/files/ASAI-07.pdfinfo:eu-repo/semantics/altIdentifier/issn/2451-7585info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-sa/3.0/Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-03T10:43:24Zoai:sedici.unlp.edu.ar:10915/70694Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-03 10:43:24.559SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
title A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
spellingShingle A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
Maisonnave, Mariano
Ciencias Informáticas
termweighting
variable extraction
information retrieval
query- term selection
title_short A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
title_full A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
title_fullStr A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
title_full_unstemmed A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
title_sort A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
dc.creator.none.fl_str_mv Maisonnave, Mariano
Delbianco, Fernando
Tohmé, Fernando Abel
Maguitman, Ana Gabriela
author Maisonnave, Mariano
author_facet Maisonnave, Mariano
Delbianco, Fernando
Tohmé, Fernando Abel
Maguitman, Ana Gabriela
author_role author
author2 Delbianco, Fernando
Tohmé, Fernando Abel
Maguitman, Ana Gabriela
author2_role author
author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
termweighting
variable extraction
information retrieval
query- term selection
topic Ciencias Informáticas
termweighting
variable extraction
information retrieval
query- term selection
dc.description.none.fl_txt_mv Successful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the potential of the proposal as a first step for identifying different types of associations between words.
Sociedad Argentina de Informática e Investigación Operativa
description Successful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the potential of the proposal as a first step for identifying different types of associations between words.
publishDate 2018
dc.date.none.fl_str_mv 2018-09
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/70694
url http://sedici.unlp.edu.ar/handle/10915/70694
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/http://47jaiio.sadio.org.ar/sites/default/files/ASAI-07.pdf
info:eu-repo/semantics/altIdentifier/issn/2451-7585
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-sa/3.0/
Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-sa/3.0/
Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
dc.format.none.fl_str_mv application/pdf
40-53
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1842260304949411840
score 13.13397