Evaluation of LSA performance in Spanish using multiple corpus of text

Autores
Carrillo, Facundo; Cecchi, Guillermo; Sigman, Mariano; Fernández Slezak, Diego
Año de publicación
2013
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Latent Semantic Analysis is a natural language processing tools that allows estimating semantic distance between terms. The success of LSA is mainly based on the training corpus choice, which have been studied principally in English. This study focuses on studying LSA with regional Spanish corpus and evaluate the performance by identifying synonyms. We found that performance was slightly better than chance, concordantly with previous results. Standard LSA method cannot dynamically increase the training corpus. By using classifiers we combined multiple LSA models and showed that the use of automatic classifiers increase the performance.
Sociedad Argentina de Informática e Investigación Operativa
Materia
Ciencias Informáticas
Latent Semantic Analysis
regional Spanish corpus
Natural Language Processing
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-sa/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/76358

id SEDICI_c6017db5301060978c8ea27ca7c7fefb
oai_identifier_str oai:sedici.unlp.edu.ar:10915/76358
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Evaluation of LSA performance in Spanish using multiple corpus of textCarrillo, FacundoCecchi, GuillermoSigman, MarianoFernández Slezak, DiegoCiencias InformáticasLatent Semantic Analysisregional Spanish corpusNatural Language ProcessingLatent Semantic Analysis is a natural language processing tools that allows estimating semantic distance between terms. The success of LSA is mainly based on the training corpus choice, which have been studied principally in English. This study focuses on studying LSA with regional Spanish corpus and evaluate the performance by identifying synonyms. We found that performance was slightly better than chance, concordantly with previous results. Standard LSA method cannot dynamically increase the training corpus. By using classifiers we combined multiple LSA models and showed that the use of automatic classifiers increase the performance.Sociedad Argentina de Informática e Investigación Operativa2013-09info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf198-201http://sedici.unlp.edu.ar/handle/10915/76358enginfo:eu-repo/semantics/altIdentifier/url/http://42jaiio.sadio.org.ar/proceedings/simposios/Trabajos/ASAI/18.pdfinfo:eu-repo/semantics/altIdentifier/issn/1850-2784info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-sa/4.0/Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-10-15T11:05:25Zoai:sedici.unlp.edu.ar:10915/76358Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-10-15 11:05:25.335SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Evaluation of LSA performance in Spanish using multiple corpus of text
title Evaluation of LSA performance in Spanish using multiple corpus of text
spellingShingle Evaluation of LSA performance in Spanish using multiple corpus of text
Carrillo, Facundo
Ciencias Informáticas
Latent Semantic Analysis
regional Spanish corpus
Natural Language Processing
title_short Evaluation of LSA performance in Spanish using multiple corpus of text
title_full Evaluation of LSA performance in Spanish using multiple corpus of text
title_fullStr Evaluation of LSA performance in Spanish using multiple corpus of text
title_full_unstemmed Evaluation of LSA performance in Spanish using multiple corpus of text
title_sort Evaluation of LSA performance in Spanish using multiple corpus of text
dc.creator.none.fl_str_mv Carrillo, Facundo
Cecchi, Guillermo
Sigman, Mariano
Fernández Slezak, Diego
author Carrillo, Facundo
author_facet Carrillo, Facundo
Cecchi, Guillermo
Sigman, Mariano
Fernández Slezak, Diego
author_role author
author2 Cecchi, Guillermo
Sigman, Mariano
Fernández Slezak, Diego
author2_role author
author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
Latent Semantic Analysis
regional Spanish corpus
Natural Language Processing
topic Ciencias Informáticas
Latent Semantic Analysis
regional Spanish corpus
Natural Language Processing
dc.description.none.fl_txt_mv Latent Semantic Analysis is a natural language processing tools that allows estimating semantic distance between terms. The success of LSA is mainly based on the training corpus choice, which have been studied principally in English. This study focuses on studying LSA with regional Spanish corpus and evaluate the performance by identifying synonyms. We found that performance was slightly better than chance, concordantly with previous results. Standard LSA method cannot dynamically increase the training corpus. By using classifiers we combined multiple LSA models and showed that the use of automatic classifiers increase the performance.
Sociedad Argentina de Informática e Investigación Operativa
description Latent Semantic Analysis is a natural language processing tools that allows estimating semantic distance between terms. The success of LSA is mainly based on the training corpus choice, which have been studied principally in English. This study focuses on studying LSA with regional Spanish corpus and evaluate the performance by identifying synonyms. We found that performance was slightly better than chance, concordantly with previous results. Standard LSA method cannot dynamically increase the training corpus. By using classifiers we combined multiple LSA models and showed that the use of automatic classifiers increase the performance.
publishDate 2013
dc.date.none.fl_str_mv 2013-09
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/76358
url http://sedici.unlp.edu.ar/handle/10915/76358
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/http://42jaiio.sadio.org.ar/proceedings/simposios/Trabajos/ASAI/18.pdf
info:eu-repo/semantics/altIdentifier/issn/1850-2784
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-sa/4.0/
Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-sa/4.0/
Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.format.none.fl_str_mv application/pdf
198-201
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1846064108526370816
score 13.22299