Evaluation of Named Entity Recognition in Historical Argentinian Documents

Autores: Darfe, Facundo; Xamena, Eduardo; Orozco, Carlos I.
Año de publicación: 2022
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: Research over historical text volumes can be performed by means of automatic tools that help historians achieve more abstract and aggregated points of view. Tasks such as Information Extraction or Text Mining can be performed more efficiently if Machine Learning models are employed. We propose the evaluation of different state-of-the-art models over a new dataset for Named Entity Recognition. The dataset was built over a History texts volume about General Güemes, a national Argentinian independence hero. The results show that some models perform better in terms of precision, recall and f1-score for most types of entities. Specifically, pretrained language models fine-tuned for this particular task show considerably higher performance than classical models based on word embeddings and other kinds of representations and models.Besides, statistical tests are provided to ensure the significance in the differences of the performance values attained. Hence, the contribution of this work is twofold, on the one hand a new corpus and dataset for Named Entity Recognition and a complete statistical assessment of performance values of state-of-the-art models over the generated dataset.
Sociedad Argentina de Informática e Investigación Operativa
Materia: Ciencias Informáticas
Named Entity Recognition and Classification
Argentinian History
Pretrained Language Models
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
Institución: Universidad Nacional de La Plata
OAI Identificador: oai:sedici.unlp.edu.ar:10915/151702

Acceder

id	SEDICI_c0e08642edfe0248c103a1005f680cca
oai_identifier_str	oai:sedici.unlp.edu.ar:10915/151702
network_acronym_str	SEDICI
repository_id_str	1329
network_name_str	SEDICI (UNLP)
spelling	Evaluation of Named Entity Recognition in Historical Argentinian DocumentsDarfe, FacundoXamena, EduardoOrozco, Carlos I.Ciencias InformáticasNamed Entity Recognition and ClassificationArgentinian HistoryPretrained Language ModelsResearch over historical text volumes can be performed by means of automatic tools that help historians achieve more abstract and aggregated points of view. Tasks such as Information Extraction or Text Mining can be performed more efficiently if Machine Learning models are employed. We propose the evaluation of different state-of-the-art models over a new dataset for Named Entity Recognition. The dataset was built over a History texts volume about General Güemes, a national Argentinian independence hero. The results show that some models perform better in terms of precision, recall and f1-score for most types of entities. Specifically, pretrained language models fine-tuned for this particular task show considerably higher performance than classical models based on word embeddings and other kinds of representations and models.Besides, statistical tests are provided to ensure the significance in the differences of the performance values attained. Hence, the contribution of this work is twofold, on the one hand a new corpus and dataset for Named Entity Recognition and a complete statistical assessment of performance values of state-of-the-art models over the generated dataset.Sociedad Argentina de Informática e Investigación Operativa2022-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf98-109http://sedici.unlp.edu.ar/handle/10915/151702enginfo:eu-repo/semantics/altIdentifier/url/https://publicaciones.sadio.org.ar/index.php/JAIIO/article/download/270/221info:eu-repo/semantics/altIdentifier/issn/2451-7496info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2026-05-27T11:33:15Zoai:sedici.unlp.edu.ar:10915/151702Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292026-05-27 11:33:15.897SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv	Evaluation of Named Entity Recognition in Historical Argentinian Documents
title	Evaluation of Named Entity Recognition in Historical Argentinian Documents
spellingShingle	Evaluation of Named Entity Recognition in Historical Argentinian Documents Darfe, Facundo Ciencias Informáticas Named Entity Recognition and Classification Argentinian History Pretrained Language Models
title_short	Evaluation of Named Entity Recognition in Historical Argentinian Documents
title_full	Evaluation of Named Entity Recognition in Historical Argentinian Documents
title_fullStr	Evaluation of Named Entity Recognition in Historical Argentinian Documents
title_full_unstemmed	Evaluation of Named Entity Recognition in Historical Argentinian Documents
title_sort	Evaluation of Named Entity Recognition in Historical Argentinian Documents
dc.creator.none.fl_str_mv	Darfe, Facundo Xamena, Eduardo Orozco, Carlos I.
author	Darfe, Facundo
author_facet	Darfe, Facundo Xamena, Eduardo Orozco, Carlos I.
author_role	author
author2	Xamena, Eduardo Orozco, Carlos I.
author2_role	author author
dc.subject.none.fl_str_mv	Ciencias Informáticas Named Entity Recognition and Classification Argentinian History Pretrained Language Models
topic	Ciencias Informáticas Named Entity Recognition and Classification Argentinian History Pretrained Language Models
dc.description.none.fl_txt_mv	Research over historical text volumes can be performed by means of automatic tools that help historians achieve more abstract and aggregated points of view. Tasks such as Information Extraction or Text Mining can be performed more efficiently if Machine Learning models are employed. We propose the evaluation of different state-of-the-art models over a new dataset for Named Entity Recognition. The dataset was built over a History texts volume about General Güemes, a national Argentinian independence hero. The results show that some models perform better in terms of precision, recall and f1-score for most types of entities. Specifically, pretrained language models fine-tuned for this particular task show considerably higher performance than classical models based on word embeddings and other kinds of representations and models.Besides, statistical tests are provided to ensure the significance in the differences of the performance values attained. Hence, the contribution of this work is twofold, on the one hand a new corpus and dataset for Named Entity Recognition and a complete statistical assessment of performance values of state-of-the-art models over the generated dataset. Sociedad Argentina de Informática e Investigación Operativa
description	Research over historical text volumes can be performed by means of automatic tools that help historians achieve more abstract and aggregated points of view. Tasks such as Information Extraction or Text Mining can be performed more efficiently if Machine Learning models are employed. We propose the evaluation of different state-of-the-art models over a new dataset for Named Entity Recognition. The dataset was built over a History texts volume about General Güemes, a national Argentinian independence hero. The results show that some models perform better in terms of precision, recall and f1-score for most types of entities. Specifically, pretrained language models fine-tuned for this particular task show considerably higher performance than classical models based on word embeddings and other kinds of representations and models.Besides, statistical tests are provided to ensure the significance in the differences of the performance values attained. Hence, the contribution of this work is twofold, on the one hand a new corpus and dataset for Named Entity Recognition and a complete statistical assessment of performance values of state-of-the-art models over the generated dataset.
publishDate	2022
dc.date.none.fl_str_mv	2022-10
dc.type.none.fl_str_mv	info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
format	conferenceObject
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://sedici.unlp.edu.ar/handle/10915/151702
url	http://sedici.unlp.edu.ar/handle/10915/151702
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/https://publicaciones.sadio.org.ar/index.php/JAIIO/article/download/270/221 info:eu-repo/semantics/altIdentifier/issn/2451-7496
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv	application/pdf 98-109
dc.source.none.fl_str_mv	reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP
reponame_str	SEDICI (UNLP)
collection	SEDICI (UNLP)
instname_str	Universidad Nacional de La Plata
instacron_str	UNLP
institution	UNLP
repository.name.fl_str_mv	SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv	alira@sedici.unlp.edu.ar
_version_	1866371979003559936
score	13.468372

Evaluation of Named Entity Recognition in Historical Argentinian Documents

Publicaciones similares