Text pre-processing tool to increase the exactness of experimental results in summarization solutions
- Autores
- Villa Monte, Augusto; Corvi, Julieta Pilar; Lanzarini, Laura Cristina; Puente, Crisitina; Cuevas, Alfredo Simón; Olivas, José A.
- Año de publicación
- 2018
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- For years, and nowadays even more because of the ease of access to information, countless scientific documents that cover all branches of human knowledge are generated. These documents, consisting mostly of text, are stored in digital libraries that are increasingly consenting access and manipulation. This has allowed these repositories of documents to be used for research work of great interest, particularly those related to evaluation of automatic summaries through experimentation. In this area of computer science, the experimental results of many of the published works are obtained using document collections, some known and others not so much, but without specifying all the special considerations to achieve said results. This produces an unfair competition in the realization of experiments when comparing results and does not allow to be objective in the obtained conclusions. This paper presents a text document manipulation tool to increase the exactness of results when obtaining, evaluating and comparing automatic summaries from different corpora. This work has been motivated by the need to have a tool that allows to process documents, split their content properly and make sure that each text snippet does not lose its contextual information. Applying the model proposed to a set of free-access scientific papers has been successful.
XV Workshop Bases de Datos y Minería de Datos (WBDDM)
Red de Universidades con Carreras en Informática (RedUNCI) - Materia
-
Ciencias Informáticas
automatic summarization
extractive approaches
web scraping
document representation
summaries evaluation - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-nc-sa/4.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/73228
Ver los metadatos del registro completo
id |
SEDICI_43a971e5e1befb759e8daff6e1fb23b8 |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/73228 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
Text pre-processing tool to increase the exactness of experimental results in summarization solutionsVilla Monte, AugustoCorvi, Julieta PilarLanzarini, Laura CristinaPuente, CrisitinaCuevas, Alfredo SimónOlivas, José A.Ciencias Informáticasautomatic summarizationextractive approachesweb scrapingdocument representationsummaries evaluationFor years, and nowadays even more because of the ease of access to information, countless scientific documents that cover all branches of human knowledge are generated. These documents, consisting mostly of text, are stored in digital libraries that are increasingly consenting access and manipulation. This has allowed these repositories of documents to be used for research work of great interest, particularly those related to evaluation of automatic summaries through experimentation. In this area of computer science, the experimental results of many of the published works are obtained using document collections, some known and others not so much, but without specifying all the special considerations to achieve said results. This produces an unfair competition in the realization of experiments when comparing results and does not allow to be objective in the obtained conclusions. This paper presents a text document manipulation tool to increase the exactness of results when obtaining, evaluating and comparing automatic summaries from different corpora. This work has been motivated by the need to have a tool that allows to process documents, split their content properly and make sure that each text snippet does not lose its contextual information. Applying the model proposed to a set of free-access scientific papers has been successful.XV Workshop Bases de Datos y Minería de Datos (WBDDM)Red de Universidades con Carreras en Informática (RedUNCI)2018-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf481-490http://sedici.unlp.edu.ar/handle/10915/73228enginfo:eu-repo/semantics/altIdentifier/isbn/978-950-658-472-6info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-03T10:44:24Zoai:sedici.unlp.edu.ar:10915/73228Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-03 10:44:24.897SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
Text pre-processing tool to increase the exactness of experimental results in summarization solutions |
title |
Text pre-processing tool to increase the exactness of experimental results in summarization solutions |
spellingShingle |
Text pre-processing tool to increase the exactness of experimental results in summarization solutions Villa Monte, Augusto Ciencias Informáticas automatic summarization extractive approaches web scraping document representation summaries evaluation |
title_short |
Text pre-processing tool to increase the exactness of experimental results in summarization solutions |
title_full |
Text pre-processing tool to increase the exactness of experimental results in summarization solutions |
title_fullStr |
Text pre-processing tool to increase the exactness of experimental results in summarization solutions |
title_full_unstemmed |
Text pre-processing tool to increase the exactness of experimental results in summarization solutions |
title_sort |
Text pre-processing tool to increase the exactness of experimental results in summarization solutions |
dc.creator.none.fl_str_mv |
Villa Monte, Augusto Corvi, Julieta Pilar Lanzarini, Laura Cristina Puente, Crisitina Cuevas, Alfredo Simón Olivas, José A. |
author |
Villa Monte, Augusto |
author_facet |
Villa Monte, Augusto Corvi, Julieta Pilar Lanzarini, Laura Cristina Puente, Crisitina Cuevas, Alfredo Simón Olivas, José A. |
author_role |
author |
author2 |
Corvi, Julieta Pilar Lanzarini, Laura Cristina Puente, Crisitina Cuevas, Alfredo Simón Olivas, José A. |
author2_role |
author author author author author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas automatic summarization extractive approaches web scraping document representation summaries evaluation |
topic |
Ciencias Informáticas automatic summarization extractive approaches web scraping document representation summaries evaluation |
dc.description.none.fl_txt_mv |
For years, and nowadays even more because of the ease of access to information, countless scientific documents that cover all branches of human knowledge are generated. These documents, consisting mostly of text, are stored in digital libraries that are increasingly consenting access and manipulation. This has allowed these repositories of documents to be used for research work of great interest, particularly those related to evaluation of automatic summaries through experimentation. In this area of computer science, the experimental results of many of the published works are obtained using document collections, some known and others not so much, but without specifying all the special considerations to achieve said results. This produces an unfair competition in the realization of experiments when comparing results and does not allow to be objective in the obtained conclusions. This paper presents a text document manipulation tool to increase the exactness of results when obtaining, evaluating and comparing automatic summaries from different corpora. This work has been motivated by the need to have a tool that allows to process documents, split their content properly and make sure that each text snippet does not lose its contextual information. Applying the model proposed to a set of free-access scientific papers has been successful. XV Workshop Bases de Datos y Minería de Datos (WBDDM) Red de Universidades con Carreras en Informática (RedUNCI) |
description |
For years, and nowadays even more because of the ease of access to information, countless scientific documents that cover all branches of human knowledge are generated. These documents, consisting mostly of text, are stored in digital libraries that are increasingly consenting access and manipulation. This has allowed these repositories of documents to be used for research work of great interest, particularly those related to evaluation of automatic summaries through experimentation. In this area of computer science, the experimental results of many of the published works are obtained using document collections, some known and others not so much, but without specifying all the special considerations to achieve said results. This produces an unfair competition in the realization of experiments when comparing results and does not allow to be objective in the obtained conclusions. This paper presents a text document manipulation tool to increase the exactness of results when obtaining, evaluating and comparing automatic summaries from different corpora. This work has been motivated by the need to have a tool that allows to process documents, split their content properly and make sure that each text snippet does not lose its contextual information. Applying the model proposed to a set of free-access scientific papers has been successful. |
publishDate |
2018 |
dc.date.none.fl_str_mv |
2018-10 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/73228 |
url |
http://sedici.unlp.edu.ar/handle/10915/73228 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/isbn/978-950-658-472-6 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.format.none.fl_str_mv |
application/pdf 481-490 |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1842260315328217088 |
score |
13.13397 |