Evaluation of LSA performance in Spanish using multiple corpus of text

Autores: Carrillo, Facundo; Cecchi, Guillermo; Sigman, Mariano; Fernández Slezak, Diego
Año de publicación: 2013
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: Latent Semantic Analysis is a natural language processing tools that allows estimating semantic distance between terms. The success of LSA is mainly based on the training corpus choice, which have been studied principally in English. This study focuses on studying LSA with regional Spanish corpus and evaluate the performance by identifying synonyms. We found that performance was slightly better than chance, concordantly with previous results. Standard LSA method cannot dynamically increase the training corpus. By using classifiers we combined multiple LSA models and showed that the use of automatic classifiers increase the performance.
Sociedad Argentina de Informática e Investigación Operativa
Materia: Ciencias Informáticas
Latent Semantic Analysis
regional Spanish corpus
Natural Language Processing
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by-sa/4.0/
Repositorio
Institución: Universidad Nacional de La Plata
OAI Identificador: oai:sedici.unlp.edu.ar:10915/76358

Acceder

id	SEDICI_c6017db5301060978c8ea27ca7c7fefb
oai_identifier_str	oai:sedici.unlp.edu.ar:10915/76358
network_acronym_str	SEDICI
repository_id_str	1329
network_name_str	SEDICI (UNLP)
spelling	Evaluation of LSA performance in Spanish using multiple corpus of textCarrillo, FacundoCecchi, GuillermoSigman, MarianoFernández Slezak, DiegoCiencias InformáticasLatent Semantic Analysisregional Spanish corpusNatural Language ProcessingLatent Semantic Analysis is a natural language processing tools that allows estimating semantic distance between terms. The success of LSA is mainly based on the training corpus choice, which have been studied principally in English. This study focuses on studying LSA with regional Spanish corpus and evaluate the performance by identifying synonyms. We found that performance was slightly better than chance, concordantly with previous results. Standard LSA method cannot dynamically increase the training corpus. By using classifiers we combined multiple LSA models and showed that the use of automatic classifiers increase the performance.Sociedad Argentina de Informática e Investigación Operativa2013-09info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf198-201http://sedici.unlp.edu.ar/handle/10915/76358enginfo:eu-repo/semantics/altIdentifier/url/http://42jaiio.sadio.org.ar/proceedings/simposios/Trabajos/ASAI/18.pdfinfo:eu-repo/semantics/altIdentifier/issn/1850-2784info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-sa/4.0/Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2026-05-27T11:07:39Zoai:sedici.unlp.edu.ar:10915/76358Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292026-05-27 11:07:39.381SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv	Evaluation of LSA performance in Spanish using multiple corpus of text
title	Evaluation of LSA performance in Spanish using multiple corpus of text
spellingShingle	Evaluation of LSA performance in Spanish using multiple corpus of text Carrillo, Facundo Ciencias Informáticas Latent Semantic Analysis regional Spanish corpus Natural Language Processing
title_short	Evaluation of LSA performance in Spanish using multiple corpus of text
title_full	Evaluation of LSA performance in Spanish using multiple corpus of text
title_fullStr	Evaluation of LSA performance in Spanish using multiple corpus of text
title_full_unstemmed	Evaluation of LSA performance in Spanish using multiple corpus of text
title_sort	Evaluation of LSA performance in Spanish using multiple corpus of text
dc.creator.none.fl_str_mv	Carrillo, Facundo Cecchi, Guillermo Sigman, Mariano Fernández Slezak, Diego
author	Carrillo, Facundo
author_facet	Carrillo, Facundo Cecchi, Guillermo Sigman, Mariano Fernández Slezak, Diego
author_role	author
author2	Cecchi, Guillermo Sigman, Mariano Fernández Slezak, Diego
author2_role	author author author
dc.subject.none.fl_str_mv	Ciencias Informáticas Latent Semantic Analysis regional Spanish corpus Natural Language Processing
topic	Ciencias Informáticas Latent Semantic Analysis regional Spanish corpus Natural Language Processing
dc.description.none.fl_txt_mv	Latent Semantic Analysis is a natural language processing tools that allows estimating semantic distance between terms. The success of LSA is mainly based on the training corpus choice, which have been studied principally in English. This study focuses on studying LSA with regional Spanish corpus and evaluate the performance by identifying synonyms. We found that performance was slightly better than chance, concordantly with previous results. Standard LSA method cannot dynamically increase the training corpus. By using classifiers we combined multiple LSA models and showed that the use of automatic classifiers increase the performance. Sociedad Argentina de Informática e Investigación Operativa
description	Latent Semantic Analysis is a natural language processing tools that allows estimating semantic distance between terms. The success of LSA is mainly based on the training corpus choice, which have been studied principally in English. This study focuses on studying LSA with regional Spanish corpus and evaluate the performance by identifying synonyms. We found that performance was slightly better than chance, concordantly with previous results. Standard LSA method cannot dynamically increase the training corpus. By using classifiers we combined multiple LSA models and showed that the use of automatic classifiers increase the performance.
publishDate	2013
dc.date.none.fl_str_mv	2013-09
dc.type.none.fl_str_mv	info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
format	conferenceObject
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://sedici.unlp.edu.ar/handle/10915/76358
url	http://sedici.unlp.edu.ar/handle/10915/76358
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/http://42jaiio.sadio.org.ar/proceedings/simposios/Trabajos/ASAI/18.pdf info:eu-repo/semantics/altIdentifier/issn/1850-2784
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-sa/4.0/ Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-sa/4.0/ Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.format.none.fl_str_mv	application/pdf 198-201
dc.source.none.fl_str_mv	reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP
reponame_str	SEDICI (UNLP)
collection	SEDICI (UNLP)
instname_str	Universidad Nacional de La Plata
instacron_str	UNLP
institution	UNLP
repository.name.fl_str_mv	SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv	alira@sedici.unlp.edu.ar
_version_	1866371576062017536
score	13.468372

Evaluation of LSA performance in Spanish using multiple corpus of text

Publicaciones similares