A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media

Autores: Maisonnave, Mariano; Delbianco, Fernando; Tohmé, Fernando Abel; Maguitman, Ana Gabriela
Año de publicación: 2018
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: Successful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the potential of the proposal as a first step for identifying different types of associations between words.
Sociedad Argentina de Informática e Investigación Operativa
Materia: Ciencias Informáticas
termweighting
variable extraction
information retrieval
query- term selection
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by-sa/3.0/
Repositorio
Institución: Universidad Nacional de La Plata
OAI Identificador: oai:sedici.unlp.edu.ar:10915/70694

Acceder

id	SEDICI_3533652910db326a73917319ff10f1c3
oai_identifier_str	oai:sedici.unlp.edu.ar:10915/70694
network_acronym_str	SEDICI
repository_id_str	1329
network_name_str	SEDICI (UNLP)
spelling	A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital MediaMaisonnave, MarianoDelbianco, FernandoTohmé, Fernando AbelMaguitman, Ana GabrielaCiencias Informáticastermweightingvariable extractioninformation retrievalquery- term selectionSuccessful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the potential of the proposal as a first step for identifying different types of associations between words.Sociedad Argentina de Informática e Investigación Operativa2018-09info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf40-53http://sedici.unlp.edu.ar/handle/10915/70694enginfo:eu-repo/semantics/altIdentifier/url/http://47jaiio.sadio.org.ar/sites/default/files/ASAI-07.pdfinfo:eu-repo/semantics/altIdentifier/issn/2451-7585info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-sa/3.0/Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2026-05-27T11:05:34Zoai:sedici.unlp.edu.ar:10915/70694Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292026-05-27 11:05:34.718SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv	A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
title	A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
spellingShingle	A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media Maisonnave, Mariano Ciencias Informáticas termweighting variable extraction information retrieval query- term selection
title_short	A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
title_full	A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
title_fullStr	A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
title_full_unstemmed	A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
title_sort	A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
dc.creator.none.fl_str_mv	Maisonnave, Mariano Delbianco, Fernando Tohmé, Fernando Abel Maguitman, Ana Gabriela
author	Maisonnave, Mariano
author_facet	Maisonnave, Mariano Delbianco, Fernando Tohmé, Fernando Abel Maguitman, Ana Gabriela
author_role	author
author2	Delbianco, Fernando Tohmé, Fernando Abel Maguitman, Ana Gabriela
author2_role	author author author
dc.subject.none.fl_str_mv	Ciencias Informáticas termweighting variable extraction information retrieval query- term selection
topic	Ciencias Informáticas termweighting variable extraction information retrieval query- term selection
dc.description.none.fl_txt_mv	Successful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the potential of the proposal as a first step for identifying different types of associations between words. Sociedad Argentina de Informática e Investigación Operativa
description	Successful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the potential of the proposal as a first step for identifying different types of associations between words.
publishDate	2018
dc.date.none.fl_str_mv	2018-09
dc.type.none.fl_str_mv	info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
format	conferenceObject
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://sedici.unlp.edu.ar/handle/10915/70694
url	http://sedici.unlp.edu.ar/handle/10915/70694
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/http://47jaiio.sadio.org.ar/sites/default/files/ASAI-07.pdf info:eu-repo/semantics/altIdentifier/issn/2451-7585
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
dc.format.none.fl_str_mv	application/pdf 40-53
dc.source.none.fl_str_mv	reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP
reponame_str	SEDICI (UNLP)
collection	SEDICI (UNLP)
instname_str	Universidad Nacional de La Plata
instacron_str	UNLP
institution	UNLP
repository.name.fl_str_mv	SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv	alira@sedici.unlp.edu.ar
_version_	1866371542696329216
score	13.343132

A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media

Publicaciones similares