Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation
- Autores
- Cardellino, Cristian Adrián; Alemany, Laura Alonso
- Año de publicación
- 2018
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- This work explores the use of word embeddings as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [21]: an unsupervised algorithm (i.e. the word embeddings) is trained on unlabeled data separately as a first step, and then its results are used by a supervised classifier. In this work we primarily focus on two aspects of VSD trained with unsupervised word representations. First, we show how the domain where the word embeddings are trained affects the performance of the supervised task. A specific domain can improve the results if this domain is shared with the domain of the supervised task, even if the word embeddings are trained with smaller corpora. Second, we show that the use of word embeddings can help the model generalize when compared to not using word embeddings. This means embeddings help by decreasing the model tendency to overfit.
Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina
Fil: Alemany, Laura Alonso. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina - Materia
-
NATURAL LANGUAGE PROCESSING
WORD EMBEDDINGS
WORD SENSE DISAMBIGUATION - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/134698
Ver los metadatos del registro completo
id |
CONICETDig_4fb0357c9058c7da10275619cea001a5 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/134698 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguationCardellino, Cristian AdriánAlemany, Laura AlonsoNATURAL LANGUAGE PROCESSINGWORD EMBEDDINGSWORD SENSE DISAMBIGUATIONhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1This work explores the use of word embeddings as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [21]: an unsupervised algorithm (i.e. the word embeddings) is trained on unlabeled data separately as a first step, and then its results are used by a supervised classifier. In this work we primarily focus on two aspects of VSD trained with unsupervised word representations. First, we show how the domain where the word embeddings are trained affects the performance of the supervised task. A specific domain can improve the results if this domain is shared with the domain of the supervised task, even if the word embeddings are trained with smaller corpora. Second, we show that the use of word embeddings can help the model generalize when compared to not using word embeddings. This means embeddings help by decreasing the model tendency to overfit.Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; ArgentinaFil: Alemany, Laura Alonso. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; ArgentinaSociedad Iberoamericana de Inteligencia Artificial2018-03-21info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/134698Cardellino, Cristian Adrián; Alemany, Laura Alonso; Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation; Sociedad Iberoamericana de Inteligencia Artificial; Inteligencia Artificial; 21; 61; 21-3-2018; 67-811137-36011988-3064CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/http://journal.iberamia.org/index.php/intartif/article/view/138info:eu-repo/semantics/altIdentifier/doi/10.4114/intartif.vol21iss61pp67-81info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:53:19Zoai:ri.conicet.gov.ar:11336/134698instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:53:19.845CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation |
title |
Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation |
spellingShingle |
Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation Cardellino, Cristian Adrián NATURAL LANGUAGE PROCESSING WORD EMBEDDINGS WORD SENSE DISAMBIGUATION |
title_short |
Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation |
title_full |
Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation |
title_fullStr |
Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation |
title_full_unstemmed |
Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation |
title_sort |
Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation |
dc.creator.none.fl_str_mv |
Cardellino, Cristian Adrián Alemany, Laura Alonso |
author |
Cardellino, Cristian Adrián |
author_facet |
Cardellino, Cristian Adrián Alemany, Laura Alonso |
author_role |
author |
author2 |
Alemany, Laura Alonso |
author2_role |
author |
dc.subject.none.fl_str_mv |
NATURAL LANGUAGE PROCESSING WORD EMBEDDINGS WORD SENSE DISAMBIGUATION |
topic |
NATURAL LANGUAGE PROCESSING WORD EMBEDDINGS WORD SENSE DISAMBIGUATION |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
This work explores the use of word embeddings as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [21]: an unsupervised algorithm (i.e. the word embeddings) is trained on unlabeled data separately as a first step, and then its results are used by a supervised classifier. In this work we primarily focus on two aspects of VSD trained with unsupervised word representations. First, we show how the domain where the word embeddings are trained affects the performance of the supervised task. A specific domain can improve the results if this domain is shared with the domain of the supervised task, even if the word embeddings are trained with smaller corpora. Second, we show that the use of word embeddings can help the model generalize when compared to not using word embeddings. This means embeddings help by decreasing the model tendency to overfit. Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba; Argentina Fil: Alemany, Laura Alonso. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina |
description |
This work explores the use of word embeddings as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [21]: an unsupervised algorithm (i.e. the word embeddings) is trained on unlabeled data separately as a first step, and then its results are used by a supervised classifier. In this work we primarily focus on two aspects of VSD trained with unsupervised word representations. First, we show how the domain where the word embeddings are trained affects the performance of the supervised task. A specific domain can improve the results if this domain is shared with the domain of the supervised task, even if the word embeddings are trained with smaller corpora. Second, we show that the use of word embeddings can help the model generalize when compared to not using word embeddings. This means embeddings help by decreasing the model tendency to overfit. |
publishDate |
2018 |
dc.date.none.fl_str_mv |
2018-03-21 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/134698 Cardellino, Cristian Adrián; Alemany, Laura Alonso; Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation; Sociedad Iberoamericana de Inteligencia Artificial; Inteligencia Artificial; 21; 61; 21-3-2018; 67-81 1137-3601 1988-3064 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/134698 |
identifier_str_mv |
Cardellino, Cristian Adrián; Alemany, Laura Alonso; Exploring the impact of word embeddings for disjoint semisupervised spanish verb sense disambiguation; Sociedad Iberoamericana de Inteligencia Artificial; Inteligencia Artificial; 21; 61; 21-3-2018; 67-81 1137-3601 1988-3064 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/http://journal.iberamia.org/index.php/intartif/article/view/138 info:eu-repo/semantics/altIdentifier/doi/10.4114/intartif.vol21iss61pp67-81 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Sociedad Iberoamericana de Inteligencia Artificial |
publisher.none.fl_str_mv |
Sociedad Iberoamericana de Inteligencia Artificial |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844613629875322880 |
score |
13.070432 |