Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts

Autores: Xamena, Eduardo; Marmanillo, Walter Gabriel; Mechaca, Ana Lidia
Año de publicación: 2019
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: Large amounts of ancient documents have become available in the last years, regarding Argentinian history. This fact turns possible to find interesting and useful aggregated information. This work proposes the application of Natural Language Processing, Text Mining and Visualization tools over Argentinian ancient document repositories. Conceptual maps and entity networks make up the first target of this preliminary paper. The first step is the normalization of OCR acquired books of General G¨uemes. Exploratory analyses reveal the presence of manifold spelling errors, due to the OCR acquisition process of the volumes. We propose smart automatic ways for overcoming this issue in the process of normalization. Besides, a first topic landscape of a subset of volumes is obtained and analysed, via Topic Modelling tools.
Sociedad Argentina de Informática e Investigación Operativa
Materia: Ciencias Informáticas
Argentinian history
Natural language processing
TextMining
Visualization
Big document repositories
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by-nc-sa/3.0/
Repositorio
Institución: Universidad Nacional de La Plata
OAI Identificador: oai:sedici.unlp.edu.ar:10915/87809

Acceder

id	SEDICI_349107fb928f5390b92717973d40014e
oai_identifier_str	oai:sedici.unlp.edu.ar:10915/87809
network_acronym_str	SEDICI
repository_id_str	1329
network_name_str	SEDICI (UNLP)
spelling	Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian TextsXamena, EduardoMarmanillo, Walter GabrielMechaca, Ana LidiaCiencias InformáticasArgentinian historyNatural language processingTextMiningVisualizationBig document repositoriesLarge amounts of ancient documents have become available in the last years, regarding Argentinian history. This fact turns possible to find interesting and useful aggregated information. This work proposes the application of Natural Language Processing, Text Mining and Visualization tools over Argentinian ancient document repositories. Conceptual maps and entity networks make up the first target of this preliminary paper. The first step is the normalization of OCR acquired books of General G¨uemes. Exploratory analyses reveal the presence of manifold spelling errors, due to the OCR acquisition process of the volumes. We propose smart automatic ways for overcoming this issue in the process of normalization. Besides, a first topic landscape of a subset of volumes is obtained and analysed, via Topic Modelling tools.Sociedad Argentina de Informática e Investigación Operativa2019-09info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf28-37http://sedici.unlp.edu.ar/handle/10915/87809enginfo:eu-repo/semantics/altIdentifier/issn/2683-8966info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/3.0/Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2026-05-27T11:11:43Zoai:sedici.unlp.edu.ar:10915/87809Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292026-05-27 11:11:43.672SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv	Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts
title	Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts
spellingShingle	Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts Xamena, Eduardo Ciencias Informáticas Argentinian history Natural language processing TextMining Visualization Big document repositories
title_short	Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts
title_full	Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts
title_fullStr	Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts
title_full_unstemmed	Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts
title_sort	Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts
dc.creator.none.fl_str_mv	Xamena, Eduardo Marmanillo, Walter Gabriel Mechaca, Ana Lidia
author	Xamena, Eduardo
author_facet	Xamena, Eduardo Marmanillo, Walter Gabriel Mechaca, Ana Lidia
author_role	author
author2	Marmanillo, Walter Gabriel Mechaca, Ana Lidia
author2_role	author author
dc.subject.none.fl_str_mv	Ciencias Informáticas Argentinian history Natural language processing TextMining Visualization Big document repositories
topic	Ciencias Informáticas Argentinian history Natural language processing TextMining Visualization Big document repositories
dc.description.none.fl_txt_mv	Large amounts of ancient documents have become available in the last years, regarding Argentinian history. This fact turns possible to find interesting and useful aggregated information. This work proposes the application of Natural Language Processing, Text Mining and Visualization tools over Argentinian ancient document repositories. Conceptual maps and entity networks make up the first target of this preliminary paper. The first step is the normalization of OCR acquired books of General G¨uemes. Exploratory analyses reveal the presence of manifold spelling errors, due to the OCR acquisition process of the volumes. We propose smart automatic ways for overcoming this issue in the process of normalization. Besides, a first topic landscape of a subset of volumes is obtained and analysed, via Topic Modelling tools. Sociedad Argentina de Informática e Investigación Operativa
description	Large amounts of ancient documents have become available in the last years, regarding Argentinian history. This fact turns possible to find interesting and useful aggregated information. This work proposes the application of Natural Language Processing, Text Mining and Visualization tools over Argentinian ancient document repositories. Conceptual maps and entity networks make up the first target of this preliminary paper. The first step is the normalization of OCR acquired books of General G¨uemes. Exploratory analyses reveal the presence of manifold spelling errors, due to the OCR acquisition process of the volumes. We propose smart automatic ways for overcoming this issue in the process of normalization. Besides, a first topic landscape of a subset of volumes is obtained and analysed, via Topic Modelling tools.
publishDate	2019
dc.date.none.fl_str_mv	2019-09
dc.type.none.fl_str_mv	info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
format	conferenceObject
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://sedici.unlp.edu.ar/handle/10915/87809
url	http://sedici.unlp.edu.ar/handle/10915/87809
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/issn/2683-8966
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/3.0/ Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-sa/3.0/ Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
dc.format.none.fl_str_mv	application/pdf 28-37
dc.source.none.fl_str_mv	reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP
reponame_str	SEDICI (UNLP)
collection	SEDICI (UNLP)
instname_str	Universidad Nacional de La Plata
instacron_str	UNLP
institution	UNLP
repository.name.fl_str_mv	SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv	alira@sedici.unlp.edu.ar
_version_	1866371639713726464
score	12.98848

Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts

Publicaciones similares