Text pre-processing tool to increase the exactness of experimental results in summarization solutions

Autores: Villa Monte, Augusto; Corvi, Julieta Pilar; Lanzarini, Laura Cristina; Puente, Crisitina; Cuevas, Alfredo Simón; Olivas, José A.
Año de publicación: 2018
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: For years, and nowadays even more because of the ease of access to information, countless scientific documents that cover all branches of human knowledge are generated. These documents, consisting mostly of text, are stored in digital libraries that are increasingly consenting access and manipulation. This has allowed these repositories of documents to be used for research work of great interest, particularly those related to evaluation of automatic summaries through experimentation. In this area of computer science, the experimental results of many of the published works are obtained using document collections, some known and others not so much, but without specifying all the special considerations to achieve said results. This produces an unfair competition in the realization of experiments when comparing results and does not allow to be objective in the obtained conclusions. This paper presents a text document manipulation tool to increase the exactness of results when obtaining, evaluating and comparing automatic summaries from different corpora. This work has been motivated by the need to have a tool that allows to process documents, split their content properly and make sure that each text snippet does not lose its contextual information. Applying the model proposed to a set of free-access scientific papers has been successful.
XV Workshop Bases de Datos y Minería de Datos (WBDDM)
Red de Universidades con Carreras en Informática (RedUNCI)
Materia: Ciencias Informáticas
automatic summarization
extractive approaches
web scraping
document representation
summaries evaluation
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
Institución: Universidad Nacional de La Plata
OAI Identificador: oai:sedici.unlp.edu.ar:10915/73228

Acceder

id	SEDICI_43a971e5e1befb759e8daff6e1fb23b8
oai_identifier_str	oai:sedici.unlp.edu.ar:10915/73228
network_acronym_str	SEDICI
repository_id_str	1329
network_name_str	SEDICI (UNLP)
spelling	Text pre-processing tool to increase the exactness of experimental results in summarization solutionsVilla Monte, AugustoCorvi, Julieta PilarLanzarini, Laura CristinaPuente, CrisitinaCuevas, Alfredo SimónOlivas, José A.Ciencias Informáticasautomatic summarizationextractive approachesweb scrapingdocument representationsummaries evaluationFor years, and nowadays even more because of the ease of access to information, countless scientific documents that cover all branches of human knowledge are generated. These documents, consisting mostly of text, are stored in digital libraries that are increasingly consenting access and manipulation. This has allowed these repositories of documents to be used for research work of great interest, particularly those related to evaluation of automatic summaries through experimentation. In this area of computer science, the experimental results of many of the published works are obtained using document collections, some known and others not so much, but without specifying all the special considerations to achieve said results. This produces an unfair competition in the realization of experiments when comparing results and does not allow to be objective in the obtained conclusions. This paper presents a text document manipulation tool to increase the exactness of results when obtaining, evaluating and comparing automatic summaries from different corpora. This work has been motivated by the need to have a tool that allows to process documents, split their content properly and make sure that each text snippet does not lose its contextual information. Applying the model proposed to a set of free-access scientific papers has been successful.XV Workshop Bases de Datos y Minería de Datos (WBDDM)Red de Universidades con Carreras en Informática (RedUNCI)2018-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf481-490http://sedici.unlp.edu.ar/handle/10915/73228enginfo:eu-repo/semantics/altIdentifier/isbn/978-950-658-472-6info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2026-05-27T11:06:33Zoai:sedici.unlp.edu.ar:10915/73228Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292026-05-27 11:06:33.615SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv	Text pre-processing tool to increase the exactness of experimental results in summarization solutions
title	Text pre-processing tool to increase the exactness of experimental results in summarization solutions
spellingShingle	Text pre-processing tool to increase the exactness of experimental results in summarization solutions Villa Monte, Augusto Ciencias Informáticas automatic summarization extractive approaches web scraping document representation summaries evaluation
title_short	Text pre-processing tool to increase the exactness of experimental results in summarization solutions
title_full	Text pre-processing tool to increase the exactness of experimental results in summarization solutions
title_fullStr	Text pre-processing tool to increase the exactness of experimental results in summarization solutions
title_full_unstemmed	Text pre-processing tool to increase the exactness of experimental results in summarization solutions
title_sort	Text pre-processing tool to increase the exactness of experimental results in summarization solutions
dc.creator.none.fl_str_mv	Villa Monte, Augusto Corvi, Julieta Pilar Lanzarini, Laura Cristina Puente, Crisitina Cuevas, Alfredo Simón Olivas, José A.
author	Villa Monte, Augusto
author_facet	Villa Monte, Augusto Corvi, Julieta Pilar Lanzarini, Laura Cristina Puente, Crisitina Cuevas, Alfredo Simón Olivas, José A.
author_role	author
author2	Corvi, Julieta Pilar Lanzarini, Laura Cristina Puente, Crisitina Cuevas, Alfredo Simón Olivas, José A.
author2_role	author author author author author
dc.subject.none.fl_str_mv	Ciencias Informáticas automatic summarization extractive approaches web scraping document representation summaries evaluation
topic	Ciencias Informáticas automatic summarization extractive approaches web scraping document representation summaries evaluation
dc.description.none.fl_txt_mv	For years, and nowadays even more because of the ease of access to information, countless scientific documents that cover all branches of human knowledge are generated. These documents, consisting mostly of text, are stored in digital libraries that are increasingly consenting access and manipulation. This has allowed these repositories of documents to be used for research work of great interest, particularly those related to evaluation of automatic summaries through experimentation. In this area of computer science, the experimental results of many of the published works are obtained using document collections, some known and others not so much, but without specifying all the special considerations to achieve said results. This produces an unfair competition in the realization of experiments when comparing results and does not allow to be objective in the obtained conclusions. This paper presents a text document manipulation tool to increase the exactness of results when obtaining, evaluating and comparing automatic summaries from different corpora. This work has been motivated by the need to have a tool that allows to process documents, split their content properly and make sure that each text snippet does not lose its contextual information. Applying the model proposed to a set of free-access scientific papers has been successful. XV Workshop Bases de Datos y Minería de Datos (WBDDM) Red de Universidades con Carreras en Informática (RedUNCI)
description	For years, and nowadays even more because of the ease of access to information, countless scientific documents that cover all branches of human knowledge are generated. These documents, consisting mostly of text, are stored in digital libraries that are increasingly consenting access and manipulation. This has allowed these repositories of documents to be used for research work of great interest, particularly those related to evaluation of automatic summaries through experimentation. In this area of computer science, the experimental results of many of the published works are obtained using document collections, some known and others not so much, but without specifying all the special considerations to achieve said results. This produces an unfair competition in the realization of experiments when comparing results and does not allow to be objective in the obtained conclusions. This paper presents a text document manipulation tool to increase the exactness of results when obtaining, evaluating and comparing automatic summaries from different corpora. This work has been motivated by the need to have a tool that allows to process documents, split their content properly and make sure that each text snippet does not lose its contextual information. Applying the model proposed to a set of free-access scientific papers has been successful.
publishDate	2018
dc.date.none.fl_str_mv	2018-10
dc.type.none.fl_str_mv	info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
format	conferenceObject
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://sedici.unlp.edu.ar/handle/10915/73228
url	http://sedici.unlp.edu.ar/handle/10915/73228
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/isbn/978-950-658-472-6
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv	application/pdf 481-490
dc.source.none.fl_str_mv	reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP
reponame_str	SEDICI (UNLP)
collection	SEDICI (UNLP)
instname_str	Universidad Nacional de La Plata
instacron_str	UNLP
institution	UNLP
repository.name.fl_str_mv	SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv	alira@sedici.unlp.edu.ar
_version_	1866371558616858624
score	13.343132

Text pre-processing tool to increase the exactness of experimental results in summarization solutions

Publicaciones similares