Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts

Autores
Xamena, Eduardo; Marmanillo, Walter Gabriel; Mechaca, Ana Lidia
Año de publicación
2019
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Large amounts of ancient documents have become available in the last years, regarding Argentinian history. This fact turns possible to find interesting and useful aggregated information. This work proposes the application of Natural Language Processing, Text Mining and Visualization tools over Argentinian ancient document repositories. Conceptual maps and entity networks make up the first target of this preliminary paper. The first step is the normalization of OCR acquired books of General G¨uemes. Exploratory analyses reveal the presence of manifold spelling errors, due to the OCR acquisition process of the volumes. We propose smart automatic ways for overcoming this issue in the process of normalization. Besides, a first topic landscape of a subset of volumes is obtained and analysed, via Topic Modelling tools.
Sociedad Argentina de Informática e Investigación Operativa
Materia
Ciencias Informáticas
Argentinian history
Natural language processing
TextMining
Visualization
Big document repositories
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-sa/3.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/87809

id SEDICI_349107fb928f5390b92717973d40014e
oai_identifier_str oai:sedici.unlp.edu.ar:10915/87809
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian TextsXamena, EduardoMarmanillo, Walter GabrielMechaca, Ana LidiaCiencias InformáticasArgentinian historyNatural language processingTextMiningVisualizationBig document repositoriesLarge amounts of ancient documents have become available in the last years, regarding Argentinian history. This fact turns possible to find interesting and useful aggregated information. This work proposes the application of Natural Language Processing, Text Mining and Visualization tools over Argentinian ancient document repositories. Conceptual maps and entity networks make up the first target of this preliminary paper. The first step is the normalization of OCR acquired books of General G¨uemes. Exploratory analyses reveal the presence of manifold spelling errors, due to the OCR acquisition process of the volumes. We propose smart automatic ways for overcoming this issue in the process of normalization. Besides, a first topic landscape of a subset of volumes is obtained and analysed, via Topic Modelling tools.Sociedad Argentina de Informática e Investigación Operativa2019-09info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf28-37http://sedici.unlp.edu.ar/handle/10915/87809enginfo:eu-repo/semantics/altIdentifier/issn/2683-8966info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/3.0/Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-03T10:49:45Zoai:sedici.unlp.edu.ar:10915/87809Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-03 10:49:45.998SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts
title Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts
spellingShingle Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts
Xamena, Eduardo
Ciencias Informáticas
Argentinian history
Natural language processing
TextMining
Visualization
Big document repositories
title_short Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts
title_full Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts
title_fullStr Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts
title_full_unstemmed Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts
title_sort Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts
dc.creator.none.fl_str_mv Xamena, Eduardo
Marmanillo, Walter Gabriel
Mechaca, Ana Lidia
author Xamena, Eduardo
author_facet Xamena, Eduardo
Marmanillo, Walter Gabriel
Mechaca, Ana Lidia
author_role author
author2 Marmanillo, Walter Gabriel
Mechaca, Ana Lidia
author2_role author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
Argentinian history
Natural language processing
TextMining
Visualization
Big document repositories
topic Ciencias Informáticas
Argentinian history
Natural language processing
TextMining
Visualization
Big document repositories
dc.description.none.fl_txt_mv Large amounts of ancient documents have become available in the last years, regarding Argentinian history. This fact turns possible to find interesting and useful aggregated information. This work proposes the application of Natural Language Processing, Text Mining and Visualization tools over Argentinian ancient document repositories. Conceptual maps and entity networks make up the first target of this preliminary paper. The first step is the normalization of OCR acquired books of General G¨uemes. Exploratory analyses reveal the presence of manifold spelling errors, due to the OCR acquisition process of the volumes. We propose smart automatic ways for overcoming this issue in the process of normalization. Besides, a first topic landscape of a subset of volumes is obtained and analysed, via Topic Modelling tools.
Sociedad Argentina de Informática e Investigación Operativa
description Large amounts of ancient documents have become available in the last years, regarding Argentinian history. This fact turns possible to find interesting and useful aggregated information. This work proposes the application of Natural Language Processing, Text Mining and Visualization tools over Argentinian ancient document repositories. Conceptual maps and entity networks make up the first target of this preliminary paper. The first step is the normalization of OCR acquired books of General G¨uemes. Exploratory analyses reveal the presence of manifold spelling errors, due to the OCR acquisition process of the volumes. We propose smart automatic ways for overcoming this issue in the process of normalization. Besides, a first topic landscape of a subset of volumes is obtained and analysed, via Topic Modelling tools.
publishDate 2019
dc.date.none.fl_str_mv 2019-09
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/87809
url http://sedici.unlp.edu.ar/handle/10915/87809
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/issn/2683-8966
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/3.0/
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/3.0/
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
dc.format.none.fl_str_mv application/pdf
28-37
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1842260374702784512
score 13.13397