Keyword Identification in Spanish Documents

Autores
Aquino, Germán Osvaldo; Lanzarini, Laura Cristina
Año de publicación
2015
Idioma
inglés
Tipo de recurso
artículo
Estado
versión publicada
Descripción
The large amount of textual information digitally available today gives rise to the need for effective means of indexing, searching and retrieving this information. Keywords are used to describe briefly and precisely the contents of a textual document. In this paper we present an algorithm for keyword extraction from documents written in Spanish.This algorithm combines autoencoders, which are adequate for highly unbalanced classification problems, with the discriminative power of conventional binary classifiers. In order to improve its performance on larger and more diverse datasets, our algorithm trains several models of each kind through bagging.
Fil: Aquino, Germán Osvaldo. Universidad Nacional de la Plata. Facultad de Informatica. Instituto de Investigación En Informatica Lidi; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Lanzarini, Laura Cristina. Universidad Nacional de la Plata. Facultad de Informatica. Instituto de Investigación En Informatica Lidi; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Materia
KEYWORD EXTRACTION
NEURAL NETWORKS
AUTOENCODERS
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/57325

id CONICETDig_1329197f5e7752b54ad295093e5d8036
oai_identifier_str oai:ri.conicet.gov.ar:11336/57325
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling Keyword Identification in Spanish DocumentsAquino, Germán OsvaldoLanzarini, Laura CristinaKEYWORD EXTRACTIONNEURAL NETWORKSAUTOENCODERShttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1The large amount of textual information digitally available today gives rise to the need for effective means of indexing, searching and retrieving this information. Keywords are used to describe briefly and precisely the contents of a textual document. In this paper we present an algorithm for keyword extraction from documents written in Spanish.This algorithm combines autoencoders, which are adequate for highly unbalanced classification problems, with the discriminative power of conventional binary classifiers. In order to improve its performance on larger and more diverse datasets, our algorithm trains several models of each kind through bagging.Fil: Aquino, Germán Osvaldo. Universidad Nacional de la Plata. Facultad de Informatica. Instituto de Investigación En Informatica Lidi; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Lanzarini, Laura Cristina. Universidad Nacional de la Plata. Facultad de Informatica. Instituto de Investigación En Informatica Lidi; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaUniversidad Nacional de La Plata. Facultad de Informática2015-12info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/57325Aquino, Germán Osvaldo; Lanzarini, Laura Cristina; Keyword Identification in Spanish Documents; Universidad Nacional de La Plata. Facultad de Informática; Journal of Computer Science & Technology; 15; 2; 12-2015; 55-601666-60461666-6038CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/JCST/article/view/554info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-17T11:33:32Zoai:ri.conicet.gov.ar:11336/57325instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-17 11:33:32.326CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv Keyword Identification in Spanish Documents
title Keyword Identification in Spanish Documents
spellingShingle Keyword Identification in Spanish Documents
Aquino, Germán Osvaldo
KEYWORD EXTRACTION
NEURAL NETWORKS
AUTOENCODERS
title_short Keyword Identification in Spanish Documents
title_full Keyword Identification in Spanish Documents
title_fullStr Keyword Identification in Spanish Documents
title_full_unstemmed Keyword Identification in Spanish Documents
title_sort Keyword Identification in Spanish Documents
dc.creator.none.fl_str_mv Aquino, Germán Osvaldo
Lanzarini, Laura Cristina
author Aquino, Germán Osvaldo
author_facet Aquino, Germán Osvaldo
Lanzarini, Laura Cristina
author_role author
author2 Lanzarini, Laura Cristina
author2_role author
dc.subject.none.fl_str_mv KEYWORD EXTRACTION
NEURAL NETWORKS
AUTOENCODERS
topic KEYWORD EXTRACTION
NEURAL NETWORKS
AUTOENCODERS
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv The large amount of textual information digitally available today gives rise to the need for effective means of indexing, searching and retrieving this information. Keywords are used to describe briefly and precisely the contents of a textual document. In this paper we present an algorithm for keyword extraction from documents written in Spanish.This algorithm combines autoencoders, which are adequate for highly unbalanced classification problems, with the discriminative power of conventional binary classifiers. In order to improve its performance on larger and more diverse datasets, our algorithm trains several models of each kind through bagging.
Fil: Aquino, Germán Osvaldo. Universidad Nacional de la Plata. Facultad de Informatica. Instituto de Investigación En Informatica Lidi; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Lanzarini, Laura Cristina. Universidad Nacional de la Plata. Facultad de Informatica. Instituto de Investigación En Informatica Lidi; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
description The large amount of textual information digitally available today gives rise to the need for effective means of indexing, searching and retrieving this information. Keywords are used to describe briefly and precisely the contents of a textual document. In this paper we present an algorithm for keyword extraction from documents written in Spanish.This algorithm combines autoencoders, which are adequate for highly unbalanced classification problems, with the discriminative power of conventional binary classifiers. In order to improve its performance on larger and more diverse datasets, our algorithm trains several models of each kind through bagging.
publishDate 2015
dc.date.none.fl_str_mv 2015-12
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str publishedVersion
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/57325
Aquino, Germán Osvaldo; Lanzarini, Laura Cristina; Keyword Identification in Spanish Documents; Universidad Nacional de La Plata. Facultad de Informática; Journal of Computer Science & Technology; 15; 2; 12-2015; 55-60
1666-6046
1666-6038
CONICET Digital
CONICET
url http://hdl.handle.net/11336/57325
identifier_str_mv Aquino, Germán Osvaldo; Lanzarini, Laura Cristina; Keyword Identification in Spanish Documents; Universidad Nacional de La Plata. Facultad de Informática; Journal of Computer Science & Technology; 15; 2; 12-2015; 55-60
1666-6046
1666-6038
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/JCST/article/view/554
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
application/pdf
dc.publisher.none.fl_str_mv Universidad Nacional de La Plata. Facultad de Informática
publisher.none.fl_str_mv Universidad Nacional de La Plata. Facultad de Informática
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1843606696910389248
score 13.000565