Text pre-processing tool to increase the exactness of experimental results in summarization solutions

Autores
Villa Monte, Augusto; Corvi, Julieta Pilar; Lanzarini, Laura Cristina; Puente, Crisitina; Cuevas, Alfredo Simón; Olivas, José A.
Año de publicación
2018
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
For years, and nowadays even more because of the ease of access to information, countless scientific documents that cover all branches of human knowledge are generated. These documents, consisting mostly of text, are stored in digital libraries that are increasingly consenting access and manipulation. This has allowed these repositories of documents to be used for research work of great interest, particularly those related to evaluation of automatic summaries through experimentation. In this area of computer science, the experimental results of many of the published works are obtained using document collections, some known and others not so much, but without specifying all the special considerations to achieve said results. This produces an unfair competition in the realization of experiments when comparing results and does not allow to be objective in the obtained conclusions. This paper presents a text document manipulation tool to increase the exactness of results when obtaining, evaluating and comparing automatic summaries from different corpora. This work has been motivated by the need to have a tool that allows to process documents, split their content properly and make sure that each text snippet does not lose its contextual information. Applying the model proposed to a set of free-access scientific papers has been successful.
XV Workshop Bases de Datos y Minería de Datos (WBDDM)
Red de Universidades con Carreras en Informática (RedUNCI)
Materia
Ciencias Informáticas
automatic summarization
extractive approaches
web scraping
document representation
summaries evaluation
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/73228

id SEDICI_43a971e5e1befb759e8daff6e1fb23b8
oai_identifier_str oai:sedici.unlp.edu.ar:10915/73228
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Text pre-processing tool to increase the exactness of experimental results in summarization solutionsVilla Monte, AugustoCorvi, Julieta PilarLanzarini, Laura CristinaPuente, CrisitinaCuevas, Alfredo SimónOlivas, José A.Ciencias Informáticasautomatic summarizationextractive approachesweb scrapingdocument representationsummaries evaluationFor years, and nowadays even more because of the ease of access to information, countless scientific documents that cover all branches of human knowledge are generated. These documents, consisting mostly of text, are stored in digital libraries that are increasingly consenting access and manipulation. This has allowed these repositories of documents to be used for research work of great interest, particularly those related to evaluation of automatic summaries through experimentation. In this area of computer science, the experimental results of many of the published works are obtained using document collections, some known and others not so much, but without specifying all the special considerations to achieve said results. This produces an unfair competition in the realization of experiments when comparing results and does not allow to be objective in the obtained conclusions. This paper presents a text document manipulation tool to increase the exactness of results when obtaining, evaluating and comparing automatic summaries from different corpora. This work has been motivated by the need to have a tool that allows to process documents, split their content properly and make sure that each text snippet does not lose its contextual information. Applying the model proposed to a set of free-access scientific papers has been successful.XV Workshop Bases de Datos y Minería de Datos (WBDDM)Red de Universidades con Carreras en Informática (RedUNCI)2018-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf481-490http://sedici.unlp.edu.ar/handle/10915/73228enginfo:eu-repo/semantics/altIdentifier/isbn/978-950-658-472-6info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-03T10:44:24Zoai:sedici.unlp.edu.ar:10915/73228Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-03 10:44:24.897SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Text pre-processing tool to increase the exactness of experimental results in summarization solutions
title Text pre-processing tool to increase the exactness of experimental results in summarization solutions
spellingShingle Text pre-processing tool to increase the exactness of experimental results in summarization solutions
Villa Monte, Augusto
Ciencias Informáticas
automatic summarization
extractive approaches
web scraping
document representation
summaries evaluation
title_short Text pre-processing tool to increase the exactness of experimental results in summarization solutions
title_full Text pre-processing tool to increase the exactness of experimental results in summarization solutions
title_fullStr Text pre-processing tool to increase the exactness of experimental results in summarization solutions
title_full_unstemmed Text pre-processing tool to increase the exactness of experimental results in summarization solutions
title_sort Text pre-processing tool to increase the exactness of experimental results in summarization solutions
dc.creator.none.fl_str_mv Villa Monte, Augusto
Corvi, Julieta Pilar
Lanzarini, Laura Cristina
Puente, Crisitina
Cuevas, Alfredo Simón
Olivas, José A.
author Villa Monte, Augusto
author_facet Villa Monte, Augusto
Corvi, Julieta Pilar
Lanzarini, Laura Cristina
Puente, Crisitina
Cuevas, Alfredo Simón
Olivas, José A.
author_role author
author2 Corvi, Julieta Pilar
Lanzarini, Laura Cristina
Puente, Crisitina
Cuevas, Alfredo Simón
Olivas, José A.
author2_role author
author
author
author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
automatic summarization
extractive approaches
web scraping
document representation
summaries evaluation
topic Ciencias Informáticas
automatic summarization
extractive approaches
web scraping
document representation
summaries evaluation
dc.description.none.fl_txt_mv For years, and nowadays even more because of the ease of access to information, countless scientific documents that cover all branches of human knowledge are generated. These documents, consisting mostly of text, are stored in digital libraries that are increasingly consenting access and manipulation. This has allowed these repositories of documents to be used for research work of great interest, particularly those related to evaluation of automatic summaries through experimentation. In this area of computer science, the experimental results of many of the published works are obtained using document collections, some known and others not so much, but without specifying all the special considerations to achieve said results. This produces an unfair competition in the realization of experiments when comparing results and does not allow to be objective in the obtained conclusions. This paper presents a text document manipulation tool to increase the exactness of results when obtaining, evaluating and comparing automatic summaries from different corpora. This work has been motivated by the need to have a tool that allows to process documents, split their content properly and make sure that each text snippet does not lose its contextual information. Applying the model proposed to a set of free-access scientific papers has been successful.
XV Workshop Bases de Datos y Minería de Datos (WBDDM)
Red de Universidades con Carreras en Informática (RedUNCI)
description For years, and nowadays even more because of the ease of access to information, countless scientific documents that cover all branches of human knowledge are generated. These documents, consisting mostly of text, are stored in digital libraries that are increasingly consenting access and manipulation. This has allowed these repositories of documents to be used for research work of great interest, particularly those related to evaluation of automatic summaries through experimentation. In this area of computer science, the experimental results of many of the published works are obtained using document collections, some known and others not so much, but without specifying all the special considerations to achieve said results. This produces an unfair competition in the realization of experiments when comparing results and does not allow to be objective in the obtained conclusions. This paper presents a text document manipulation tool to increase the exactness of results when obtaining, evaluating and comparing automatic summaries from different corpora. This work has been motivated by the need to have a tool that allows to process documents, split their content properly and make sure that each text snippet does not lose its contextual information. Applying the model proposed to a set of free-access scientific papers has been successful.
publishDate 2018
dc.date.none.fl_str_mv 2018-10
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/73228
url http://sedici.unlp.edu.ar/handle/10915/73228
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/isbn/978-950-658-472-6
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv application/pdf
481-490
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1842260315328217088
score 13.13397