Evaluation of Named Entity Recognition in Historical Argentinian Documents
- Autores
- Darfe, Facundo; Xamena, Eduardo; Orozco, Carlos I.
- Año de publicación
- 2022
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- Research over historical text volumes can be performed by means of automatic tools that help historians achieve more abstract and aggregated points of view. Tasks such as Information Extraction or Text Mining can be performed more efficiently if Machine Learning models are employed. We propose the evaluation of different state-of-the-art models over a new dataset for Named Entity Recognition. The dataset was built over a History texts volume about General Güemes, a national Argentinian independence hero. The results show that some models perform better in terms of precision, recall and f1-score for most types of entities. Specifically, pretrained language models fine-tuned for this particular task show considerably higher performance than classical models based on word embeddings and other kinds of representations and models.Besides, statistical tests are provided to ensure the significance in the differences of the performance values attained. Hence, the contribution of this work is twofold, on the one hand a new corpus and dataset for Named Entity Recognition and a complete statistical assessment of performance values of state-of-the-art models over the generated dataset.
Sociedad Argentina de Informática e Investigación Operativa - Materia
-
Ciencias Informáticas
Named Entity Recognition and Classification
Argentinian History
Pretrained Language Models - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-nc-sa/4.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/151702
Ver los metadatos del registro completo
id |
SEDICI_c0e08642edfe0248c103a1005f680cca |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/151702 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
Evaluation of Named Entity Recognition in Historical Argentinian DocumentsDarfe, FacundoXamena, EduardoOrozco, Carlos I.Ciencias InformáticasNamed Entity Recognition and ClassificationArgentinian HistoryPretrained Language ModelsResearch over historical text volumes can be performed by means of automatic tools that help historians achieve more abstract and aggregated points of view. Tasks such as Information Extraction or Text Mining can be performed more efficiently if Machine Learning models are employed. We propose the evaluation of different state-of-the-art models over a new dataset for Named Entity Recognition. The dataset was built over a History texts volume about General Güemes, a national Argentinian independence hero. The results show that some models perform better in terms of precision, recall and f1-score for most types of entities. Specifically, pretrained language models fine-tuned for this particular task show considerably higher performance than classical models based on word embeddings and other kinds of representations and models.Besides, statistical tests are provided to ensure the significance in the differences of the performance values attained. Hence, the contribution of this work is twofold, on the one hand a new corpus and dataset for Named Entity Recognition and a complete statistical assessment of performance values of state-of-the-art models over the generated dataset.Sociedad Argentina de Informática e Investigación Operativa2022-10info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf98-109http://sedici.unlp.edu.ar/handle/10915/151702enginfo:eu-repo/semantics/altIdentifier/url/https://publicaciones.sadio.org.ar/index.php/JAIIO/article/download/270/221info:eu-repo/semantics/altIdentifier/issn/2451-7496info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T11:39:06Zoai:sedici.unlp.edu.ar:10915/151702Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 11:39:06.691SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
Evaluation of Named Entity Recognition in Historical Argentinian Documents |
title |
Evaluation of Named Entity Recognition in Historical Argentinian Documents |
spellingShingle |
Evaluation of Named Entity Recognition in Historical Argentinian Documents Darfe, Facundo Ciencias Informáticas Named Entity Recognition and Classification Argentinian History Pretrained Language Models |
title_short |
Evaluation of Named Entity Recognition in Historical Argentinian Documents |
title_full |
Evaluation of Named Entity Recognition in Historical Argentinian Documents |
title_fullStr |
Evaluation of Named Entity Recognition in Historical Argentinian Documents |
title_full_unstemmed |
Evaluation of Named Entity Recognition in Historical Argentinian Documents |
title_sort |
Evaluation of Named Entity Recognition in Historical Argentinian Documents |
dc.creator.none.fl_str_mv |
Darfe, Facundo Xamena, Eduardo Orozco, Carlos I. |
author |
Darfe, Facundo |
author_facet |
Darfe, Facundo Xamena, Eduardo Orozco, Carlos I. |
author_role |
author |
author2 |
Xamena, Eduardo Orozco, Carlos I. |
author2_role |
author author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas Named Entity Recognition and Classification Argentinian History Pretrained Language Models |
topic |
Ciencias Informáticas Named Entity Recognition and Classification Argentinian History Pretrained Language Models |
dc.description.none.fl_txt_mv |
Research over historical text volumes can be performed by means of automatic tools that help historians achieve more abstract and aggregated points of view. Tasks such as Information Extraction or Text Mining can be performed more efficiently if Machine Learning models are employed. We propose the evaluation of different state-of-the-art models over a new dataset for Named Entity Recognition. The dataset was built over a History texts volume about General Güemes, a national Argentinian independence hero. The results show that some models perform better in terms of precision, recall and f1-score for most types of entities. Specifically, pretrained language models fine-tuned for this particular task show considerably higher performance than classical models based on word embeddings and other kinds of representations and models.Besides, statistical tests are provided to ensure the significance in the differences of the performance values attained. Hence, the contribution of this work is twofold, on the one hand a new corpus and dataset for Named Entity Recognition and a complete statistical assessment of performance values of state-of-the-art models over the generated dataset. Sociedad Argentina de Informática e Investigación Operativa |
description |
Research over historical text volumes can be performed by means of automatic tools that help historians achieve more abstract and aggregated points of view. Tasks such as Information Extraction or Text Mining can be performed more efficiently if Machine Learning models are employed. We propose the evaluation of different state-of-the-art models over a new dataset for Named Entity Recognition. The dataset was built over a History texts volume about General Güemes, a national Argentinian independence hero. The results show that some models perform better in terms of precision, recall and f1-score for most types of entities. Specifically, pretrained language models fine-tuned for this particular task show considerably higher performance than classical models based on word embeddings and other kinds of representations and models.Besides, statistical tests are provided to ensure the significance in the differences of the performance values attained. Hence, the contribution of this work is twofold, on the one hand a new corpus and dataset for Named Entity Recognition and a complete statistical assessment of performance values of state-of-the-art models over the generated dataset. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-10 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/151702 |
url |
http://sedici.unlp.edu.ar/handle/10915/151702 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://publicaciones.sadio.org.ar/index.php/JAIIO/article/download/270/221 info:eu-repo/semantics/altIdentifier/issn/2451-7496 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.format.none.fl_str_mv |
application/pdf 98-109 |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1844616265308569600 |
score |
13.070432 |