Text pre-processing tool to increase the exactness of experimental results in summarization solutions

Villa Monte, Augusto; Corvi, Julieta Pilar; Lanzarini, Laura Cristina; Puente, Crisitina; Cuevas, Alfredo Simón; Olivas, José A.
Publication Year
conference paper
Published version
For years, and nowadays even more because of the ease of access to information, countless scientific documents that cover all branches of human knowledge are generated. These documents, consisting mostly of text, are stored in digital libraries that are increasingly consenting access and manipulation. This has allowed these repositories of documents to be used for research work of great interest, particularly those related to evaluation of automatic summaries through experimentation. In this area of computer science, the experimental results of many of the published works are obtained using document collections, some known and others not so much, but without specifying all the special considerations to achieve said results. This produces an unfair competition in the realization of experiments when comparing results and does not allow to be objective in the obtained conclusions. This paper presents a text document manipulation tool to increase the exactness of results when obtaining, evaluating and comparing automatic summaries from different corpora. This work has been motivated by the need to have a tool that allows to process documents, split their content properly and make sure that each text snippet does not lose its contextual information. Applying the model proposed to a set of free-access scientific papers has been successful.
XV Workshop Bases de Datos y Minería de Datos (WBDDM)
Red de Universidades con Carreras en Informática (RedUNCI)
Ciencias Informáticas
automatic summarization
extractive approaches
web scraping
document representation
summaries evaluation
Access level
Open access
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Universidad Nacional de La Plata
OAI Identifier