Evaluation of LSA performance in Spanish using multiple corpus of text
- Autores
- Carrillo, Facundo; Cecchi, Guillermo; Sigman, Mariano; Fernández Slezak, Diego
- Año de publicación
- 2013
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- Latent Semantic Analysis is a natural language processing tools that allows estimating semantic distance between terms. The success of LSA is mainly based on the training corpus choice, which have been studied principally in English. This study focuses on studying LSA with regional Spanish corpus and evaluate the performance by identifying synonyms. We found that performance was slightly better than chance, concordantly with previous results. Standard LSA method cannot dynamically increase the training corpus. By using classifiers we combined multiple LSA models and showed that the use of automatic classifiers increase the performance.
Sociedad Argentina de Informática e Investigación Operativa - Materia
-
Ciencias Informáticas
Latent Semantic Analysis
regional Spanish corpus
Natural Language Processing - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-sa/4.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/76358
Ver los metadatos del registro completo
id |
SEDICI_c6017db5301060978c8ea27ca7c7fefb |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/76358 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
Evaluation of LSA performance in Spanish using multiple corpus of textCarrillo, FacundoCecchi, GuillermoSigman, MarianoFernández Slezak, DiegoCiencias InformáticasLatent Semantic Analysisregional Spanish corpusNatural Language ProcessingLatent Semantic Analysis is a natural language processing tools that allows estimating semantic distance between terms. The success of LSA is mainly based on the training corpus choice, which have been studied principally in English. This study focuses on studying LSA with regional Spanish corpus and evaluate the performance by identifying synonyms. We found that performance was slightly better than chance, concordantly with previous results. Standard LSA method cannot dynamically increase the training corpus. By using classifiers we combined multiple LSA models and showed that the use of automatic classifiers increase the performance.Sociedad Argentina de Informática e Investigación Operativa2013-09info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf198-201http://sedici.unlp.edu.ar/handle/10915/76358enginfo:eu-repo/semantics/altIdentifier/url/http://42jaiio.sadio.org.ar/proceedings/simposios/Trabajos/ASAI/18.pdfinfo:eu-repo/semantics/altIdentifier/issn/1850-2784info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-sa/4.0/Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-10-15T11:05:25Zoai:sedici.unlp.edu.ar:10915/76358Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-10-15 11:05:25.335SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
Evaluation of LSA performance in Spanish using multiple corpus of text |
title |
Evaluation of LSA performance in Spanish using multiple corpus of text |
spellingShingle |
Evaluation of LSA performance in Spanish using multiple corpus of text Carrillo, Facundo Ciencias Informáticas Latent Semantic Analysis regional Spanish corpus Natural Language Processing |
title_short |
Evaluation of LSA performance in Spanish using multiple corpus of text |
title_full |
Evaluation of LSA performance in Spanish using multiple corpus of text |
title_fullStr |
Evaluation of LSA performance in Spanish using multiple corpus of text |
title_full_unstemmed |
Evaluation of LSA performance in Spanish using multiple corpus of text |
title_sort |
Evaluation of LSA performance in Spanish using multiple corpus of text |
dc.creator.none.fl_str_mv |
Carrillo, Facundo Cecchi, Guillermo Sigman, Mariano Fernández Slezak, Diego |
author |
Carrillo, Facundo |
author_facet |
Carrillo, Facundo Cecchi, Guillermo Sigman, Mariano Fernández Slezak, Diego |
author_role |
author |
author2 |
Cecchi, Guillermo Sigman, Mariano Fernández Slezak, Diego |
author2_role |
author author author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas Latent Semantic Analysis regional Spanish corpus Natural Language Processing |
topic |
Ciencias Informáticas Latent Semantic Analysis regional Spanish corpus Natural Language Processing |
dc.description.none.fl_txt_mv |
Latent Semantic Analysis is a natural language processing tools that allows estimating semantic distance between terms. The success of LSA is mainly based on the training corpus choice, which have been studied principally in English. This study focuses on studying LSA with regional Spanish corpus and evaluate the performance by identifying synonyms. We found that performance was slightly better than chance, concordantly with previous results. Standard LSA method cannot dynamically increase the training corpus. By using classifiers we combined multiple LSA models and showed that the use of automatic classifiers increase the performance. Sociedad Argentina de Informática e Investigación Operativa |
description |
Latent Semantic Analysis is a natural language processing tools that allows estimating semantic distance between terms. The success of LSA is mainly based on the training corpus choice, which have been studied principally in English. This study focuses on studying LSA with regional Spanish corpus and evaluate the performance by identifying synonyms. We found that performance was slightly better than chance, concordantly with previous results. Standard LSA method cannot dynamically increase the training corpus. By using classifiers we combined multiple LSA models and showed that the use of automatic classifiers increase the performance. |
publishDate |
2013 |
dc.date.none.fl_str_mv |
2013-09 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/76358 |
url |
http://sedici.unlp.edu.ar/handle/10915/76358 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/http://42jaiio.sadio.org.ar/proceedings/simposios/Trabajos/ASAI/18.pdf info:eu-repo/semantics/altIdentifier/issn/1850-2784 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-sa/4.0/ Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-sa/4.0/ Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) |
dc.format.none.fl_str_mv |
application/pdf 198-201 |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1846064108526370816 |
score |
13.22299 |