Extraction of geographic entities from biological textual sources

Autores: Acuña-Chaves, Moises A.; Araya-Monge, José E.
Año de publicación: 2017
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: This work is focused on the exploration and application of entities extraction techniques for the codification and identification of geographical locations present in the geographic distribution section within botanic documents, such as the plant species manual of Costa Rica. Several technologies must be combined to achieve such objective, among them is Natural Language Processing (NLP) that helps in the extraction of entities with the usage of gazetteers. Another technology is the usage of rules (regular expressions, Deterministic Automata, context-free grammars). Additional to the identification and codification, an algorithm to bind the place names extracted to authorized sources such as gazetteer is presented. This algorithm identifies and enriches the entry text with extra information, extracted from the paragraphs where the distribution is defined in a semi unstructured text. The values of interest for this work are: world and Costa Rica distribution. After those values are identified, the information can be processed and become useful for diverse applications, such as geographic information systems. Other research projects might be interested in the results of this project. The evaluation consists in manually judging randomly selected sample of the results to establish if the algorithm yields useful data. The judgment features the evaluation of the world and Costa Rica distribution using the source context, given 3 possible values: GOOD, BAD, UNKNOWN. The ideal is to have the least BAD percentage. The algorithm is relatively good to geo-code and bind the world distribution. More work needs to be done for the Costa Rica distribution.
Sociedad Argentina de Informática e Investigación Operativa (SADIO)
Materia: Ciencias Informáticas
técnicas de extracción
Procesamiento de Lenguaje Natural
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by-sa/3.0/
Repositorio
Institución: Universidad Nacional de La Plata
OAI Identificador: oai:sedici.unlp.edu.ar:10915/63263

Acceder

id	SEDICI_745576bce4b8458fc9cfb3fa42c2d3e2
oai_identifier_str	oai:sedici.unlp.edu.ar:10915/63263
network_acronym_str	SEDICI
repository_id_str	1329
network_name_str	SEDICI (UNLP)
spelling	Extraction of geographic entities from biological textual sourcesAcuña-Chaves, Moises A.Araya-Monge, José E.Ciencias Informáticastécnicas de extracciónProcesamiento de Lenguaje NaturalThis work is focused on the exploration and application of entities extraction techniques for the codification and identification of geographical locations present in the geographic distribution section within botanic documents, such as the plant species manual of Costa Rica. Several technologies must be combined to achieve such objective, among them is Natural Language Processing (NLP) that helps in the extraction of entities with the usage of gazetteers. Another technology is the usage of rules (regular expressions, Deterministic Automata, context-free grammars). Additional to the identification and codification, an algorithm to bind the place names extracted to authorized sources such as gazetteer is presented. This algorithm identifies and enriches the entry text with extra information, extracted from the paragraphs where the distribution is defined in a semi unstructured text. The values of interest for this work are: world and Costa Rica distribution. After those values are identified, the information can be processed and become useful for diverse applications, such as geographic information systems. Other research projects might be interested in the results of this project. The evaluation consists in manually judging randomly selected sample of the results to establish if the algorithm yields useful data. The judgment features the evaluation of the world and Costa Rica distribution using the source context, given 3 possible values: GOOD, BAD, UNKNOWN. The ideal is to have the least BAD percentage. The algorithm is relatively good to geo-code and bind the world distribution. More work needs to be done for the Costa Rica distribution.Sociedad Argentina de Informática e Investigación Operativa (SADIO)2017-09info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttp://sedici.unlp.edu.ar/handle/10915/63263enginfo:eu-repo/semantics/altIdentifier/url/http://www.clei2017-46jaiio.sadio.org.ar/sites/default/files/Mem/SLMDI/SLMDI-10.pdfinfo:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-sa/3.0/Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2026-05-27T11:03:05Zoai:sedici.unlp.edu.ar:10915/63263Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292026-05-27 11:03:05.834SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv	Extraction of geographic entities from biological textual sources
title	Extraction of geographic entities from biological textual sources
spellingShingle	Extraction of geographic entities from biological textual sources Acuña-Chaves, Moises A. Ciencias Informáticas técnicas de extracción Procesamiento de Lenguaje Natural
title_short	Extraction of geographic entities from biological textual sources
title_full	Extraction of geographic entities from biological textual sources
title_fullStr	Extraction of geographic entities from biological textual sources
title_full_unstemmed	Extraction of geographic entities from biological textual sources
title_sort	Extraction of geographic entities from biological textual sources
dc.creator.none.fl_str_mv	Acuña-Chaves, Moises A. Araya-Monge, José E.
author	Acuña-Chaves, Moises A.
author_facet	Acuña-Chaves, Moises A. Araya-Monge, José E.
author_role	author
author2	Araya-Monge, José E.
author2_role	author
dc.subject.none.fl_str_mv	Ciencias Informáticas técnicas de extracción Procesamiento de Lenguaje Natural
topic	Ciencias Informáticas técnicas de extracción Procesamiento de Lenguaje Natural
dc.description.none.fl_txt_mv	This work is focused on the exploration and application of entities extraction techniques for the codification and identification of geographical locations present in the geographic distribution section within botanic documents, such as the plant species manual of Costa Rica. Several technologies must be combined to achieve such objective, among them is Natural Language Processing (NLP) that helps in the extraction of entities with the usage of gazetteers. Another technology is the usage of rules (regular expressions, Deterministic Automata, context-free grammars). Additional to the identification and codification, an algorithm to bind the place names extracted to authorized sources such as gazetteer is presented. This algorithm identifies and enriches the entry text with extra information, extracted from the paragraphs where the distribution is defined in a semi unstructured text. The values of interest for this work are: world and Costa Rica distribution. After those values are identified, the information can be processed and become useful for diverse applications, such as geographic information systems. Other research projects might be interested in the results of this project. The evaluation consists in manually judging randomly selected sample of the results to establish if the algorithm yields useful data. The judgment features the evaluation of the world and Costa Rica distribution using the source context, given 3 possible values: GOOD, BAD, UNKNOWN. The ideal is to have the least BAD percentage. The algorithm is relatively good to geo-code and bind the world distribution. More work needs to be done for the Costa Rica distribution. Sociedad Argentina de Informática e Investigación Operativa (SADIO)
description	This work is focused on the exploration and application of entities extraction techniques for the codification and identification of geographical locations present in the geographic distribution section within botanic documents, such as the plant species manual of Costa Rica. Several technologies must be combined to achieve such objective, among them is Natural Language Processing (NLP) that helps in the extraction of entities with the usage of gazetteers. Another technology is the usage of rules (regular expressions, Deterministic Automata, context-free grammars). Additional to the identification and codification, an algorithm to bind the place names extracted to authorized sources such as gazetteer is presented. This algorithm identifies and enriches the entry text with extra information, extracted from the paragraphs where the distribution is defined in a semi unstructured text. The values of interest for this work are: world and Costa Rica distribution. After those values are identified, the information can be processed and become useful for diverse applications, such as geographic information systems. Other research projects might be interested in the results of this project. The evaluation consists in manually judging randomly selected sample of the results to establish if the algorithm yields useful data. The judgment features the evaluation of the world and Costa Rica distribution using the source context, given 3 possible values: GOOD, BAD, UNKNOWN. The ideal is to have the least BAD percentage. The algorithm is relatively good to geo-code and bind the world distribution. More work needs to be done for the Costa Rica distribution.
publishDate	2017
dc.date.none.fl_str_mv	2017-09
dc.type.none.fl_str_mv	info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
format	conferenceObject
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://sedici.unlp.edu.ar/handle/10915/63263
url	http://sedici.unlp.edu.ar/handle/10915/63263
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/http://www.clei2017-46jaiio.sadio.org.ar/sites/default/files/Mem/SLMDI/SLMDI-10.pdf
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
dc.format.none.fl_str_mv	application/pdf
dc.source.none.fl_str_mv	reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP
reponame_str	SEDICI (UNLP)
collection	SEDICI (UNLP)
instname_str	Universidad Nacional de La Plata
instacron_str	UNLP
institution	UNLP
repository.name.fl_str_mv	SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv	alira@sedici.unlp.edu.ar
_version_	1866371502676377600
score	13.343132

Extraction of geographic entities from biological textual sources

Publicaciones similares