Keyword Identification in Spanish Documents using Neural Networks

Autores: Aquino, Germán Osvaldo; Lanzarini, Laura Cristina
Año de publicación: 2015
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: The large amount of textual information digitally available today gives rise to the need for effective means of indexing, searching and retrieving this information. Keywords are used to describe briefly and precisely the contents of a textual document. In this paper we present an algorithm for keyword extraction from documents written in Spanish.This algorithm combines autoencoders, which are adequate for highly unbalanced classification problems, with the discriminative power of conventional binary classifiers. In order to improve its performance on larger and more diverse datasets, our algorithm trains several models of each kind through bagging.
XII Workshop Bases de Datos y Minería de Datos (WBDDM)
Red de Universidades con Carreras en Informática (RedUNCI)
Materia: Ciencias Informáticas
keyword extraction
autoencoders
Neural nets
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
Institución: Universidad Nacional de La Plata
OAI Identificador: oai:sedici.unlp.edu.ar:10915/50434

Acceder

id	SEDICI_a0fb9182c0e5e0ab9141199ec2b2ada8
oai_identifier_str	oai:sedici.unlp.edu.ar:10915/50434
network_acronym_str	SEDICI
repository_id_str	1329
network_name_str	SEDICI (UNLP)
spelling	Keyword Identification in Spanish Documents using Neural NetworksAquino, Germán OsvaldoLanzarini, Laura CristinaCiencias Informáticaskeyword extractionautoencodersNeural netsThe large amount of textual information digitally available today gives rise to the need for effective means of indexing, searching and retrieving this information. Keywords are used to describe briefly and precisely the contents of a textual document. In this paper we present an algorithm for keyword extraction from documents written in Spanish.This algorithm combines autoencoders, which are adequate for highly unbalanced classification problems, with the discriminative power of conventional binary classifiers. In order to improve its performance on larger and more diverse datasets, our algorithm trains several models of each kind through bagging.XII Workshop Bases de Datos y Minería de Datos (WBDDM)Red de Universidades con Carreras en Informática (RedUNCI)2015-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/50434enginfo:eu-repo/semantics/altIdentifier/isbn/978-987-3806-05-6info:eu-repo/semantics/reference/hdl/10915/50028info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/2.5/ar/Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2026-05-27T10:58:43Zoai:sedici.unlp.edu.ar:10915/50434Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292026-05-27 10:58:43.821SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv	Keyword Identification in Spanish Documents using Neural Networks
title	Keyword Identification in Spanish Documents using Neural Networks
spellingShingle	Keyword Identification in Spanish Documents using Neural Networks Aquino, Germán Osvaldo Ciencias Informáticas keyword extraction autoencoders Neural nets
title_short	Keyword Identification in Spanish Documents using Neural Networks
title_full	Keyword Identification in Spanish Documents using Neural Networks
title_fullStr	Keyword Identification in Spanish Documents using Neural Networks
title_full_unstemmed	Keyword Identification in Spanish Documents using Neural Networks
title_sort	Keyword Identification in Spanish Documents using Neural Networks
dc.creator.none.fl_str_mv	Aquino, Germán Osvaldo Lanzarini, Laura Cristina
author	Aquino, Germán Osvaldo
author_facet	Aquino, Germán Osvaldo Lanzarini, Laura Cristina
author_role	author
author2	Lanzarini, Laura Cristina
author2_role	author
dc.subject.none.fl_str_mv	Ciencias Informáticas keyword extraction autoencoders Neural nets
topic	Ciencias Informáticas keyword extraction autoencoders Neural nets
dc.description.none.fl_txt_mv	The large amount of textual information digitally available today gives rise to the need for effective means of indexing, searching and retrieving this information. Keywords are used to describe briefly and precisely the contents of a textual document. In this paper we present an algorithm for keyword extraction from documents written in Spanish.This algorithm combines autoencoders, which are adequate for highly unbalanced classification problems, with the discriminative power of conventional binary classifiers. In order to improve its performance on larger and more diverse datasets, our algorithm trains several models of each kind through bagging. XII Workshop Bases de Datos y Minería de Datos (WBDDM) Red de Universidades con Carreras en Informática (RedUNCI)
description	The large amount of textual information digitally available today gives rise to the need for effective means of indexing, searching and retrieving this information. Keywords are used to describe briefly and precisely the contents of a textual document. In this paper we present an algorithm for keyword extraction from documents written in Spanish.This algorithm combines autoencoders, which are adequate for highly unbalanced classification problems, with the discriminative power of conventional binary classifiers. In order to improve its performance on larger and more diverse datasets, our algorithm trains several models of each kind through bagging.
publishDate	2015
dc.date.none.fl_str_mv	2015-10
dc.type.none.fl_str_mv	info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
format	conferenceObject
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://sedici.unlp.edu.ar/handle/10915/50434
url	http://sedici.unlp.edu.ar/handle/10915/50434
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/isbn/978-987-3806-05-6 info:eu-repo/semantics/reference/hdl/10915/50028
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/2.5/ar/ Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-sa/2.5/ar/ Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Argentina (CC BY-NC-SA 2.5)
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP
reponame_str	SEDICI (UNLP)
collection	SEDICI (UNLP)
instname_str	Universidad Nacional de La Plata
instacron_str	UNLP
institution	UNLP
repository.name.fl_str_mv	SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv	alira@sedici.unlp.edu.ar
_version_	1866371432402911232
score	13.040872

Keyword Identification in Spanish Documents using Neural Networks

Publicaciones similares