Keyword Identification in Spanish Documents

Autores: Aquino, Germán Osvaldo; Lanzarini, Laura Cristina
Año de publicación: 2015
Idioma: inglés
Tipo de recurso: artículo
Estado: versión publicada
Descripción: The large amount of textual information digitally available today gives rise to the need for effective means of indexing, searching and retrieving this information. Keywords are used to describe briefly and precisely the contents of a textual document. In this paper we present an algorithm for keyword extraction from documents written in Spanish.This algorithm combines autoencoders, which are adequate for highly unbalanced classification problems, with the discriminative power of conventional binary classifiers. In order to improve its performance on larger and more diverse datasets, our algorithm trains several models of each kind through bagging.
Fil: Aquino, Germán Osvaldo. Universidad Nacional de la Plata. Facultad de Informatica. Instituto de Investigación En Informatica Lidi; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Fil: Lanzarini, Laura Cristina. Universidad Nacional de la Plata. Facultad de Informatica. Instituto de Investigación En Informatica Lidi; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
Materia: KEYWORD EXTRACTION
NEURAL NETWORKS
AUTOENCODERS
Nivel de accesibilidad: acceso abierto
Condiciones de uso: https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
Institución: Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador: oai:ri.conicet.gov.ar:11336/57325

Acceder

id	CONICETDig_1329197f5e7752b54ad295093e5d8036
oai_identifier_str	oai:ri.conicet.gov.ar:11336/57325
network_acronym_str	CONICETDig
repository_id_str	3498
network_name_str	CONICET Digital (CONICET)
spelling	Keyword Identification in Spanish DocumentsAquino, Germán OsvaldoLanzarini, Laura CristinaKEYWORD EXTRACTIONNEURAL NETWORKSAUTOENCODERShttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1The large amount of textual information digitally available today gives rise to the need for effective means of indexing, searching and retrieving this information. Keywords are used to describe briefly and precisely the contents of a textual document. In this paper we present an algorithm for keyword extraction from documents written in Spanish.This algorithm combines autoencoders, which are adequate for highly unbalanced classification problems, with the discriminative power of conventional binary classifiers. In order to improve its performance on larger and more diverse datasets, our algorithm trains several models of each kind through bagging.Fil: Aquino, Germán Osvaldo. Universidad Nacional de la Plata. Facultad de Informatica. Instituto de Investigación En Informatica Lidi; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Lanzarini, Laura Cristina. Universidad Nacional de la Plata. Facultad de Informatica. Instituto de Investigación En Informatica Lidi; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaUniversidad Nacional de La Plata. Facultad de Informática2015-12info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/57325Aquino, Germán Osvaldo; Lanzarini, Laura Cristina; Keyword Identification in Spanish Documents; Universidad Nacional de La Plata. Facultad de Informática; Journal of Computer Science & Technology; 15; 2; 12-2015; 55-601666-60461666-6038CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/JCST/article/view/554info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2026-06-04T11:11:01Zoai:ri.conicet.gov.ar:11336/57325instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982026-06-04 11:11:01.832CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv	Keyword Identification in Spanish Documents
title	Keyword Identification in Spanish Documents
spellingShingle	Keyword Identification in Spanish Documents Aquino, Germán Osvaldo KEYWORD EXTRACTION NEURAL NETWORKS AUTOENCODERS
title_short	Keyword Identification in Spanish Documents
title_full	Keyword Identification in Spanish Documents
title_fullStr	Keyword Identification in Spanish Documents
title_full_unstemmed	Keyword Identification in Spanish Documents
title_sort	Keyword Identification in Spanish Documents
dc.creator.none.fl_str_mv	Aquino, Germán Osvaldo Lanzarini, Laura Cristina
author	Aquino, Germán Osvaldo
author_facet	Aquino, Germán Osvaldo Lanzarini, Laura Cristina
author_role	author
author2	Lanzarini, Laura Cristina
author2_role	author
dc.subject.none.fl_str_mv	KEYWORD EXTRACTION NEURAL NETWORKS AUTOENCODERS
topic	KEYWORD EXTRACTION NEURAL NETWORKS AUTOENCODERS
purl_subject.fl_str_mv	https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv	The large amount of textual information digitally available today gives rise to the need for effective means of indexing, searching and retrieving this information. Keywords are used to describe briefly and precisely the contents of a textual document. In this paper we present an algorithm for keyword extraction from documents written in Spanish.This algorithm combines autoencoders, which are adequate for highly unbalanced classification problems, with the discriminative power of conventional binary classifiers. In order to improve its performance on larger and more diverse datasets, our algorithm trains several models of each kind through bagging. Fil: Aquino, Germán Osvaldo. Universidad Nacional de la Plata. Facultad de Informatica. Instituto de Investigación En Informatica Lidi; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina Fil: Lanzarini, Laura Cristina. Universidad Nacional de la Plata. Facultad de Informatica. Instituto de Investigación En Informatica Lidi; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina
description	The large amount of textual information digitally available today gives rise to the need for effective means of indexing, searching and retrieving this information. Keywords are used to describe briefly and precisely the contents of a textual document. In this paper we present an algorithm for keyword extraction from documents written in Spanish.This algorithm combines autoencoders, which are adequate for highly unbalanced classification problems, with the discriminative power of conventional binary classifiers. In order to improve its performance on larger and more diverse datasets, our algorithm trains several models of each kind through bagging.
publishDate	2015
dc.date.none.fl_str_mv	2015-12
dc.type.none.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo
format	article
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://hdl.handle.net/11336/57325 Aquino, Germán Osvaldo; Lanzarini, Laura Cristina; Keyword Identification in Spanish Documents; Universidad Nacional de La Plata. Facultad de Informática; Journal of Computer Science & Technology; 15; 2; 12-2015; 55-60 1666-6046 1666-6038 CONICET Digital CONICET
url	http://hdl.handle.net/11336/57325
identifier_str_mv	Aquino, Germán Osvaldo; Lanzarini, Laura Cristina; Keyword Identification in Spanish Documents; Universidad Nacional de La Plata. Facultad de Informática; Journal of Computer Science & Technology; 15; 2; 12-2015; 55-60 1666-6046 1666-6038 CONICET Digital CONICET
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/JCST/article/view/554
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv	openAccess
rights_invalid_str_mv	https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv	application/pdf application/pdf application/pdf
dc.publisher.none.fl_str_mv	Universidad Nacional de La Plata. Facultad de Informática
publisher.none.fl_str_mv	Universidad Nacional de La Plata. Facultad de Informática
dc.source.none.fl_str_mv	reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str	CONICET Digital (CONICET)
collection	CONICET Digital (CONICET)
instname_str	Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv	CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv	dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_	1867099217456005120
score	12.832306

Keyword Identification in Spanish Documents

Publicaciones similares