Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts
- Autores
- Xamena, Eduardo; Marmanillo, Walter Gabriel; Mechaca, Ana Lidia
- Año de publicación
- 2019
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- Large amounts of ancient documents have become available in the last years, regarding Argentinian history. This fact turns possible to find interesting and useful aggregated information. This work proposes the application of Natural Language Processing, Text Mining and Visualization tools over Argentinian ancient document repositories. Conceptual maps and entity networks make up the first target of this preliminary paper. The first step is the normalization of OCR acquired books of General G¨uemes. Exploratory analyses reveal the presence of manifold spelling errors, due to the OCR acquisition process of the volumes. We propose smart automatic ways for overcoming this issue in the process of normalization. Besides, a first topic landscape of a subset of volumes is obtained and analysed, via Topic Modelling tools.
Sociedad Argentina de Informática e Investigación Operativa - Materia
-
Ciencias Informáticas
Argentinian history
Natural language processing
TextMining
Visualization
Big document repositories - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-nc-sa/3.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/87809
Ver los metadatos del registro completo
id |
SEDICI_349107fb928f5390b92717973d40014e |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/87809 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian TextsXamena, EduardoMarmanillo, Walter GabrielMechaca, Ana LidiaCiencias InformáticasArgentinian historyNatural language processingTextMiningVisualizationBig document repositoriesLarge amounts of ancient documents have become available in the last years, regarding Argentinian history. This fact turns possible to find interesting and useful aggregated information. This work proposes the application of Natural Language Processing, Text Mining and Visualization tools over Argentinian ancient document repositories. Conceptual maps and entity networks make up the first target of this preliminary paper. The first step is the normalization of OCR acquired books of General G¨uemes. Exploratory analyses reveal the presence of manifold spelling errors, due to the OCR acquisition process of the volumes. We propose smart automatic ways for overcoming this issue in the process of normalization. Besides, a first topic landscape of a subset of volumes is obtained and analysed, via Topic Modelling tools.Sociedad Argentina de Informática e Investigación Operativa2019-09info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf28-37http://sedici.unlp.edu.ar/handle/10915/87809enginfo:eu-repo/semantics/altIdentifier/issn/2683-8966info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/3.0/Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-03T10:49:45Zoai:sedici.unlp.edu.ar:10915/87809Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-03 10:49:45.998SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts |
title |
Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts |
spellingShingle |
Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts Xamena, Eduardo Ciencias Informáticas Argentinian history Natural language processing TextMining Visualization Big document repositories |
title_short |
Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts |
title_full |
Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts |
title_fullStr |
Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts |
title_full_unstemmed |
Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts |
title_sort |
Rebuilding the Story of a Hero: Information Extraction in Ancient Argentinian Texts |
dc.creator.none.fl_str_mv |
Xamena, Eduardo Marmanillo, Walter Gabriel Mechaca, Ana Lidia |
author |
Xamena, Eduardo |
author_facet |
Xamena, Eduardo Marmanillo, Walter Gabriel Mechaca, Ana Lidia |
author_role |
author |
author2 |
Marmanillo, Walter Gabriel Mechaca, Ana Lidia |
author2_role |
author author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas Argentinian history Natural language processing TextMining Visualization Big document repositories |
topic |
Ciencias Informáticas Argentinian history Natural language processing TextMining Visualization Big document repositories |
dc.description.none.fl_txt_mv |
Large amounts of ancient documents have become available in the last years, regarding Argentinian history. This fact turns possible to find interesting and useful aggregated information. This work proposes the application of Natural Language Processing, Text Mining and Visualization tools over Argentinian ancient document repositories. Conceptual maps and entity networks make up the first target of this preliminary paper. The first step is the normalization of OCR acquired books of General G¨uemes. Exploratory analyses reveal the presence of manifold spelling errors, due to the OCR acquisition process of the volumes. We propose smart automatic ways for overcoming this issue in the process of normalization. Besides, a first topic landscape of a subset of volumes is obtained and analysed, via Topic Modelling tools. Sociedad Argentina de Informática e Investigación Operativa |
description |
Large amounts of ancient documents have become available in the last years, regarding Argentinian history. This fact turns possible to find interesting and useful aggregated information. This work proposes the application of Natural Language Processing, Text Mining and Visualization tools over Argentinian ancient document repositories. Conceptual maps and entity networks make up the first target of this preliminary paper. The first step is the normalization of OCR acquired books of General G¨uemes. Exploratory analyses reveal the presence of manifold spelling errors, due to the OCR acquisition process of the volumes. We propose smart automatic ways for overcoming this issue in the process of normalization. Besides, a first topic landscape of a subset of volumes is obtained and analysed, via Topic Modelling tools. |
publishDate |
2019 |
dc.date.none.fl_str_mv |
2019-09 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/87809 |
url |
http://sedici.unlp.edu.ar/handle/10915/87809 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/issn/2683-8966 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/3.0/ Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc-sa/3.0/ Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) |
dc.format.none.fl_str_mv |
application/pdf 28-37 |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1842260374702784512 |
score |
13.13397 |