Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation

Autores
Cardellino, Cristian Adrián; Alemany, Laura Alonso
Año de publicación
2018
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
This work explores the use of word embeddings as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [21]: an unsupervised algorithm (i.e. the word embeddings) is trained on unlabeled data separately as a first step, and then its results are used by a supervised classifier. In this work we primarily focus on two aspects of VSD trained with unsupervised word representations. First, we show how the domain where the word embeddings are trained affects the performance of the supervised task. A specific domain can improve the results if this domain is shared with the domain of the supervised task, even if the word embeddings are trained with smaller corpora. Second, we show that the use of word embeddings can help the model generalize when compared to not using word embeddings. This means embeddings help by decreasing the model tendency to overfit.
Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina
Fil: Alemany, Laura Alonso. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina
Materia
NATURAL LANGUAGE PROCESSING
WORD EMBEDDINGS
WORD SENSE DISAMBIGUATION
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/134698

id CONICETDig_4fb0357c9058c7da10275619cea001a5
oai_identifier_str oai:ri.conicet.gov.ar:11336/134698
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguationCardellino, Cristian AdriánAlemany, Laura AlonsoNATURAL LANGUAGE PROCESSINGWORD EMBEDDINGSWORD SENSE DISAMBIGUATIONhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1This work explores the use of word embeddings as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [21]: an unsupervised algorithm (i.e. the word embeddings) is trained on unlabeled data separately as a first step, and then its results are used by a supervised classifier. In this work we primarily focus on two aspects of VSD trained with unsupervised word representations. First, we show how the domain where the word embeddings are trained affects the performance of the supervised task. A specific domain can improve the results if this domain is shared with the domain of the supervised task, even if the word embeddings are trained with smaller corpora. Second, we show that the use of word embeddings can help the model generalize when compared to not using word embeddings. This means embeddings help by decreasing the model tendency to overfit.Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; ArgentinaFil: Alemany, Laura Alonso. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; ArgentinaSociedad Iberoamericana de Inteligencia Artificial2018-03-21info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/134698Cardellino, Cristian Adrián; Alemany, Laura Alonso; Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation; Sociedad Iberoamericana de Inteligencia Artificial; Inteligencia Artificial; 21; 61; 21-3-2018; 67-811137-36011988-3064CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/http://journal.iberamia.org/index.php/intartif/article/view/138info:eu-repo/semantics/altIdentifier/doi/10.4114/intartif.vol21iss61pp67-81info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:53:19Zoai:ri.conicet.gov.ar:11336/134698instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:53:19.845CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation
title Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation
spellingShingle Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation
Cardellino, Cristian Adrián
NATURAL LANGUAGE PROCESSING
WORD EMBEDDINGS
WORD SENSE DISAMBIGUATION
title_short Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation
title_full Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation
title_fullStr Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation
title_full_unstemmed Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation
title_sort Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation
dc.creator.none.fl_str_mv Cardellino, Cristian Adrián
Alemany, Laura Alonso
author Cardellino, Cristian Adrián
author_facet Cardellino, Cristian Adrián
Alemany, Laura Alonso
author_role author
author2 Alemany, Laura Alonso
author2_role author
dc.subject.none.fl_str_mv NATURAL LANGUAGE PROCESSING
WORD EMBEDDINGS
WORD SENSE DISAMBIGUATION
topic NATURAL LANGUAGE PROCESSING
WORD EMBEDDINGS
WORD SENSE DISAMBIGUATION
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv This work explores the use of word embeddings as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [21]: an unsupervised algorithm (i.e. the word embeddings) is trained on unlabeled data separately as a first step, and then its results are used by a supervised classifier. In this work we primarily focus on two aspects of VSD trained with unsupervised word representations. First, we show how the domain where the word embeddings are trained affects the performance of the supervised task. A specific domain can improve the results if this domain is shared with the domain of the supervised task, even if the word embeddings are trained with smaller corpora. Second, we show that the use of word embeddings can help the model generalize when compared to not using word embeddings. This means embeddings help by decreasing the model tendency to overfit.
Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina
Fil: Alemany, Laura Alonso. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina
description This work explores the use of word embeddings as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [21]: an unsupervised algorithm (i.e. the word embeddings) is trained on unlabeled data separately as a first step, and then its results are used by a supervised classifier. In this work we primarily focus on two aspects of VSD trained with unsupervised word representations. First, we show how the domain where the word embeddings are trained affects the performance of the supervised task. A specific domain can improve the results if this domain is shared with the domain of the supervised task, even if the word embeddings are trained with smaller corpora. Second, we show that the use of word embeddings can help the model generalize when compared to not using word embeddings. This means embeddings help by decreasing the model tendency to overfit.
publishDate 2018
dc.date.none.fl_str_mv 2018-03-21
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/134698
Cardellino, Cristian Adrián; Alemany, Laura Alonso; Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation; Sociedad Iberoamericana de Inteligencia Artificial; Inteligencia Artificial; 21; 61; 21-3-2018; 67-81
1137-3601
1988-3064
CONICET Digital
CONICET
url http://hdl.handle.net/11336/134698
identifier_str_mv Cardellino, Cristian Adrián; Alemany, Laura Alonso; Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation; Sociedad Iberoamericana de Inteligencia Artificial; Inteligencia Artificial; 21; 61; 21-3-2018; 67-81
1137-3601
1988-3064
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/http://journal.iberamia.org/index.php/intartif/article/view/138
info:eu-repo/semantics/altIdentifier/doi/10.4114/intartif.vol21iss61pp67-81
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.publisher.none.fl_str_mv Sociedad Iberoamericana de Inteligencia Artificial
publisher.none.fl_str_mv Sociedad Iberoamericana de Inteligencia Artificial
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844613629875322880
score 13.070432