Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings
- Autores
- Cardellino, Cristian; Alonso i Alemany, Laura
- Año de publicación
- 2017
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- This work explores the use of word embeddings, also known as word vectors, trained on Spanish corpora, to use as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [1]: an unsupervised algorithm is trained on unlabeled data separately as a first step, and then its results (i.e. the word embeddings) are fed to a supervised classifier. Throughout this paper we try to assert two hypothesis: (i) representations of training instances based on word embeddings improve the performance of supervised models for VSD, in contrast to more standard feature engineering techniques based on information taken from the training data; (ii) using word embeddings trained on a specific domain, in this case the same domain the labeled data is gathered from, has a positive impact on the model’s performance, when compared to general domain’s word embeddings. The performance of a model over the data is not only measured using standard metric techniques (e.g. accuracy or precision/recall) but also measuring the model tendency to overfit the available data by analyzing the learning curve. Measuring this overfitting tendency is important as there is a small amount of available data, thus we need to find models to generalize better the VSD problem. For the task we use SenSem [2], a corpus and lexicon of Spanish and Catalan disambiguated verbs, as our base resource for experimentation.
Sociedad Argentina de Informática e Investigación Operativa - Materia
-
Ciencias Informáticas
word embeddings
disjoint semisupervised learning
verb sense disambiguation - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-sa/4.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/65941
Ver los metadatos del registro completo
id |
SEDICI_2713855161a7fcc585f41ec644c8fe5d |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/65941 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word EmbeddingsCardellino, CristianAlonso i Alemany, LauraCiencias Informáticasword embeddingsdisjoint semisupervised learningverb sense disambiguationThis work explores the use of word embeddings, also known as word vectors, trained on Spanish corpora, to use as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [1]: an unsupervised algorithm is trained on unlabeled data separately as a first step, and then its results (i.e. the word embeddings) are fed to a supervised classifier. Throughout this paper we try to assert two hypothesis: (i) representations of training instances based on word embeddings improve the performance of supervised models for VSD, in contrast to more standard feature engineering techniques based on information taken from the training data; (ii) using word embeddings trained on a specific domain, in this case the same domain the labeled data is gathered from, has a positive impact on the model’s performance, when compared to general domain’s word embeddings. The performance of a model over the data is not only measured using standard metric techniques (e.g. accuracy or precision/recall) but also measuring the model tendency to overfit the available data by analyzing the learning curve. Measuring this overfitting tendency is important as there is a small amount of available data, thus we need to find models to generalize better the VSD problem. For the task we use SenSem [2], a corpus and lexicon of Spanish and Catalan disambiguated verbs, as our base resource for experimentation.Sociedad Argentina de Informática e Investigación Operativa2017-09info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf26-34http://sedici.unlp.edu.ar/handle/10915/65941enginfo:eu-repo/semantics/altIdentifier/url/http://www.clei2017-46jaiio.sadio.org.ar/sites/default/files/Mem/ASAI/asai-05.pdfinfo:eu-repo/semantics/altIdentifier/issn/2451-7585info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-sa/4.0/Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T11:09:45Zoai:sedici.unlp.edu.ar:10915/65941Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 11:09:45.368SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings |
title |
Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings |
spellingShingle |
Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings Cardellino, Cristian Ciencias Informáticas word embeddings disjoint semisupervised learning verb sense disambiguation |
title_short |
Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings |
title_full |
Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings |
title_fullStr |
Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings |
title_full_unstemmed |
Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings |
title_sort |
Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings |
dc.creator.none.fl_str_mv |
Cardellino, Cristian Alonso i Alemany, Laura |
author |
Cardellino, Cristian |
author_facet |
Cardellino, Cristian Alonso i Alemany, Laura |
author_role |
author |
author2 |
Alonso i Alemany, Laura |
author2_role |
author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas word embeddings disjoint semisupervised learning verb sense disambiguation |
topic |
Ciencias Informáticas word embeddings disjoint semisupervised learning verb sense disambiguation |
dc.description.none.fl_txt_mv |
This work explores the use of word embeddings, also known as word vectors, trained on Spanish corpora, to use as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [1]: an unsupervised algorithm is trained on unlabeled data separately as a first step, and then its results (i.e. the word embeddings) are fed to a supervised classifier. Throughout this paper we try to assert two hypothesis: (i) representations of training instances based on word embeddings improve the performance of supervised models for VSD, in contrast to more standard feature engineering techniques based on information taken from the training data; (ii) using word embeddings trained on a specific domain, in this case the same domain the labeled data is gathered from, has a positive impact on the model’s performance, when compared to general domain’s word embeddings. The performance of a model over the data is not only measured using standard metric techniques (e.g. accuracy or precision/recall) but also measuring the model tendency to overfit the available data by analyzing the learning curve. Measuring this overfitting tendency is important as there is a small amount of available data, thus we need to find models to generalize better the VSD problem. For the task we use SenSem [2], a corpus and lexicon of Spanish and Catalan disambiguated verbs, as our base resource for experimentation. Sociedad Argentina de Informática e Investigación Operativa |
description |
This work explores the use of word embeddings, also known as word vectors, trained on Spanish corpora, to use as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [1]: an unsupervised algorithm is trained on unlabeled data separately as a first step, and then its results (i.e. the word embeddings) are fed to a supervised classifier. Throughout this paper we try to assert two hypothesis: (i) representations of training instances based on word embeddings improve the performance of supervised models for VSD, in contrast to more standard feature engineering techniques based on information taken from the training data; (ii) using word embeddings trained on a specific domain, in this case the same domain the labeled data is gathered from, has a positive impact on the model’s performance, when compared to general domain’s word embeddings. The performance of a model over the data is not only measured using standard metric techniques (e.g. accuracy or precision/recall) but also measuring the model tendency to overfit the available data by analyzing the learning curve. Measuring this overfitting tendency is important as there is a small amount of available data, thus we need to find models to generalize better the VSD problem. For the task we use SenSem [2], a corpus and lexicon of Spanish and Catalan disambiguated verbs, as our base resource for experimentation. |
publishDate |
2017 |
dc.date.none.fl_str_mv |
2017-09 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/65941 |
url |
http://sedici.unlp.edu.ar/handle/10915/65941 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/http://www.clei2017-46jaiio.sadio.org.ar/sites/default/files/Mem/ASAI/asai-05.pdf info:eu-repo/semantics/altIdentifier/issn/2451-7585 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-sa/4.0/ Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-sa/4.0/ Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.format.none.fl_str_mv |
application/pdf 26-34 |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1844615965465116672 |
score |
13.069144 |