A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media
- Autores
- Maisonnave, Mariano; Delbianco, Fernando; Tohmé, Fernando Abel; Maguitman, Ana Gabriela
- Año de publicación
- 2018
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- Successful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the potential of the proposal as a first step for identifying different types of associations between words.
Sociedad Argentina de Informática e Investigación Operativa - Materia
-
Ciencias Informáticas
termweighting
variable extraction
information retrieval
query- term selection - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-sa/3.0/
- Repositorio
.jpg)
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/70694
Ver los metadatos del registro completo
| id |
SEDICI_3533652910db326a73917319ff10f1c3 |
|---|---|
| oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/70694 |
| network_acronym_str |
SEDICI |
| repository_id_str |
1329 |
| network_name_str |
SEDICI (UNLP) |
| spelling |
A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital MediaMaisonnave, MarianoDelbianco, FernandoTohmé, Fernando AbelMaguitman, Ana GabrielaCiencias Informáticastermweightingvariable extractioninformation retrievalquery- term selectionSuccessful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the potential of the proposal as a first step for identifying different types of associations between words.Sociedad Argentina de Informática e Investigación Operativa2018-09info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf40-53http://sedici.unlp.edu.ar/handle/10915/70694enginfo:eu-repo/semantics/altIdentifier/url/http://47jaiio.sadio.org.ar/sites/default/files/ASAI-07.pdfinfo:eu-repo/semantics/altIdentifier/issn/2451-7585info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-sa/3.0/Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-10-22T16:52:16Zoai:sedici.unlp.edu.ar:10915/70694Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-10-22 16:52:16.609SEDICI (UNLP) - Universidad Nacional de La Platafalse |
| dc.title.none.fl_str_mv |
A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media |
| title |
A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media |
| spellingShingle |
A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media Maisonnave, Mariano Ciencias Informáticas termweighting variable extraction information retrieval query- term selection |
| title_short |
A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media |
| title_full |
A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media |
| title_fullStr |
A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media |
| title_full_unstemmed |
A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media |
| title_sort |
A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media |
| dc.creator.none.fl_str_mv |
Maisonnave, Mariano Delbianco, Fernando Tohmé, Fernando Abel Maguitman, Ana Gabriela |
| author |
Maisonnave, Mariano |
| author_facet |
Maisonnave, Mariano Delbianco, Fernando Tohmé, Fernando Abel Maguitman, Ana Gabriela |
| author_role |
author |
| author2 |
Delbianco, Fernando Tohmé, Fernando Abel Maguitman, Ana Gabriela |
| author2_role |
author author author |
| dc.subject.none.fl_str_mv |
Ciencias Informáticas termweighting variable extraction information retrieval query- term selection |
| topic |
Ciencias Informáticas termweighting variable extraction information retrieval query- term selection |
| dc.description.none.fl_txt_mv |
Successful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the potential of the proposal as a first step for identifying different types of associations between words. Sociedad Argentina de Informática e Investigación Operativa |
| description |
Successful modeling and prediction depend on effective methods for the extraction of domain-relevant variables. This paper proposes a methodology for identifying domain-specific terms. The proposed methodology relies on a collection of documents labeled as relevant or irrelevant to the domain under analysis. Based on the labeled document collection, we propose a supervised technique that weights terms based on their descriptive and discriminating power. Finally, the descriptive and discriminating values are combined into a general measure that, through the use of an adjustable parameter, allows to independently favor different aspects of retrieval such as maximizing precision or recall, or achieving a balance between both of them. The proposed technique is applied to the economic domain and is empirically evaluated through a human-subject experiment involving experts and non-experts in Economy. It is also evaluated as a term-weighting technique for query-term selection showing promising results. We finally illustrate the potential of the proposal as a first step for identifying different types of associations between words. |
| publishDate |
2018 |
| dc.date.none.fl_str_mv |
2018-09 |
| dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
| format |
conferenceObject |
| status_str |
publishedVersion |
| dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/70694 |
| url |
http://sedici.unlp.edu.ar/handle/10915/70694 |
| dc.language.none.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/http://47jaiio.sadio.org.ar/sites/default/files/ASAI-07.pdf info:eu-repo/semantics/altIdentifier/issn/2451-7585 |
| dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) |
| eu_rights_str_mv |
openAccess |
| rights_invalid_str_mv |
http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) |
| dc.format.none.fl_str_mv |
application/pdf 40-53 |
| dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
| reponame_str |
SEDICI (UNLP) |
| collection |
SEDICI (UNLP) |
| instname_str |
Universidad Nacional de La Plata |
| instacron_str |
UNLP |
| institution |
UNLP |
| repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
| repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
| _version_ |
1846783091331301376 |
| score |
12.982451 |