Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings

Autores: Cardellino, Cristian; Alonso i Alemany, Laura
Año de publicación: 2017
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: This work explores the use of word embeddings, also known as word vectors, trained on Spanish corpora, to use as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [1]: an unsupervised algorithm is trained on unlabeled data separately as a first step, and then its results (i.e. the word embeddings) are fed to a supervised classifier. Throughout this paper we try to assert two hypothesis: (i) representations of training instances based on word embeddings improve the performance of supervised models for VSD, in contrast to more standard feature engineering techniques based on information taken from the training data; (ii) using word embeddings trained on a specific domain, in this case the same domain the labeled data is gathered from, has a positive impact on the model’s performance, when compared to general domain’s word embeddings. The performance of a model over the data is not only measured using standard metric techniques (e.g. accuracy or precision/recall) but also measuring the model tendency to overfit the available data by analyzing the learning curve. Measuring this overfitting tendency is important as there is a small amount of available data, thus we need to find models to generalize better the VSD problem. For the task we use SenSem [2], a corpus and lexicon of Spanish and Catalan disambiguated verbs, as our base resource for experimentation.
Sociedad Argentina de Informática e Investigación Operativa
Materia: Ciencias Informáticas
word embeddings
disjoint semisupervised learning
verb sense disambiguation
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by-sa/4.0/
Repositorio
Institución: Universidad Nacional de La Plata
OAI Identificador: oai:sedici.unlp.edu.ar:10915/65941

Acceder

id	SEDICI_2713855161a7fcc585f41ec644c8fe5d
oai_identifier_str	oai:sedici.unlp.edu.ar:10915/65941
network_acronym_str	SEDICI
repository_id_str	1329
network_name_str	SEDICI (UNLP)
spelling	Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word EmbeddingsCardellino, CristianAlonso i Alemany, LauraCiencias Informáticasword embeddingsdisjoint semisupervised learningverb sense disambiguationThis work explores the use of word embeddings, also known as word vectors, trained on Spanish corpora, to use as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [1]: an unsupervised algorithm is trained on unlabeled data separately as a first step, and then its results (i.e. the word embeddings) are fed to a supervised classifier. Throughout this paper we try to assert two hypothesis: (i) representations of training instances based on word embeddings improve the performance of supervised models for VSD, in contrast to more standard feature engineering techniques based on information taken from the training data; (ii) using word embeddings trained on a specific domain, in this case the same domain the labeled data is gathered from, has a positive impact on the model’s performance, when compared to general domain’s word embeddings. The performance of a model over the data is not only measured using standard metric techniques (e.g. accuracy or precision/recall) but also measuring the model tendency to overfit the available data by analyzing the learning curve. Measuring this overfitting tendency is important as there is a small amount of available data, thus we need to find models to generalize better the VSD problem. For the task we use SenSem [2], a corpus and lexicon of Spanish and Catalan disambiguated verbs, as our base resource for experimentation.Sociedad Argentina de Informática e Investigación Operativa2017-09info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf26-34http://sedici.unlp.edu.ar/handle/10915/65941enginfo:eu-repo/semantics/altIdentifier/url/http://www.clei2017-46jaiio.sadio.org.ar/sites/default/files/Mem/ASAI/asai-05.pdfinfo:eu-repo/semantics/altIdentifier/issn/2451-7585info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-sa/4.0/Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-12-23T11:12:05Zoai:sedici.unlp.edu.ar:10915/65941Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-12-23 11:12:05.906SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv	Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings
title	Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings
spellingShingle	Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings Cardellino, Cristian Ciencias Informáticas word embeddings disjoint semisupervised learning verb sense disambiguation
title_short	Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings
title_full	Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings
title_fullStr	Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings
title_full_unstemmed	Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings
title_sort	Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings
dc.creator.none.fl_str_mv	Cardellino, Cristian Alonso i Alemany, Laura
author	Cardellino, Cristian
author_facet	Cardellino, Cristian Alonso i Alemany, Laura
author_role	author
author2	Alonso i Alemany, Laura
author2_role	author
dc.subject.none.fl_str_mv	Ciencias Informáticas word embeddings disjoint semisupervised learning verb sense disambiguation
topic	Ciencias Informáticas word embeddings disjoint semisupervised learning verb sense disambiguation
dc.description.none.fl_txt_mv	This work explores the use of word embeddings, also known as word vectors, trained on Spanish corpora, to use as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [1]: an unsupervised algorithm is trained on unlabeled data separately as a first step, and then its results (i.e. the word embeddings) are fed to a supervised classifier. Throughout this paper we try to assert two hypothesis: (i) representations of training instances based on word embeddings improve the performance of supervised models for VSD, in contrast to more standard feature engineering techniques based on information taken from the training data; (ii) using word embeddings trained on a specific domain, in this case the same domain the labeled data is gathered from, has a positive impact on the model’s performance, when compared to general domain’s word embeddings. The performance of a model over the data is not only measured using standard metric techniques (e.g. accuracy or precision/recall) but also measuring the model tendency to overfit the available data by analyzing the learning curve. Measuring this overfitting tendency is important as there is a small amount of available data, thus we need to find models to generalize better the VSD problem. For the task we use SenSem [2], a corpus and lexicon of Spanish and Catalan disambiguated verbs, as our base resource for experimentation. Sociedad Argentina de Informática e Investigación Operativa
description	This work explores the use of word embeddings, also known as word vectors, trained on Spanish corpora, to use as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [1]: an unsupervised algorithm is trained on unlabeled data separately as a first step, and then its results (i.e. the word embeddings) are fed to a supervised classifier. Throughout this paper we try to assert two hypothesis: (i) representations of training instances based on word embeddings improve the performance of supervised models for VSD, in contrast to more standard feature engineering techniques based on information taken from the training data; (ii) using word embeddings trained on a specific domain, in this case the same domain the labeled data is gathered from, has a positive impact on the model’s performance, when compared to general domain’s word embeddings. The performance of a model over the data is not only measured using standard metric techniques (e.g. accuracy or precision/recall) but also measuring the model tendency to overfit the available data by analyzing the learning curve. Measuring this overfitting tendency is important as there is a small amount of available data, thus we need to find models to generalize better the VSD problem. For the task we use SenSem [2], a corpus and lexicon of Spanish and Catalan disambiguated verbs, as our base resource for experimentation.
publishDate	2017
dc.date.none.fl_str_mv	2017-09
dc.type.none.fl_str_mv	info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
format	conferenceObject
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://sedici.unlp.edu.ar/handle/10915/65941
url	http://sedici.unlp.edu.ar/handle/10915/65941
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/http://www.clei2017-46jaiio.sadio.org.ar/sites/default/files/Mem/ASAI/asai-05.pdf info:eu-repo/semantics/altIdentifier/issn/2451-7585
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-sa/4.0/ Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-sa/4.0/ Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.format.none.fl_str_mv	application/pdf 26-34
dc.source.none.fl_str_mv	reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP
reponame_str	SEDICI (UNLP)
collection	SEDICI (UNLP)
instname_str	Universidad Nacional de La Plata
instacron_str	UNLP
institution	UNLP
repository.name.fl_str_mv	SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv	alira@sedici.unlp.edu.ar
_version_	1852334058182279168
score	12.952241

Disjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings

Publicaciones similares