Manuscript document digitalization and recognition: a first approach

Autores
De Giusti, Marisa Raquel; Vila, María Marta; Villarreal, Gonzalo Luján
Año de publicación
2005
Idioma
inglés
Tipo de recurso
artículo
Estado
versión enviada
Descripción
The handwritten manuscript recognizing process belongs to a set of initiatives which lean to the preservation of cultural patrimony gathered in libraries and archives, where there exist a great wealth in documents and even handwritten cards that accompany incunabula books. This work is the starting point of a research and development project oriented to digitalization and recognition of manuscript materials. The paper presented here discuss different algorithms used in the first stage dedicated to image noise-cleaning in order to improve it before the character recognition process begins. In order to make the handwritten-text recognition and image digitalization process efficient, it must be preceded by a preprocessing stage of the image to be treated, which includes thresholding, noise cleaning, thinning, base-line alignment and image segmentation, among others. Each of these steps will allow us to reduce the injurious variability when recognizing manuscripts (noise, random gray levels, slanted characters, ink level in different zones), and so increasing the probability of obtaining a suitable text recognition. In this paper, two image thinning methods are considered, and implemented. Finally, an evaluation is carried out obtaining many conclusions related to efficiency, speed and requirements, as well as ideas for future implementations.
Materia
Ciencias de la Computación e Información
digitalización
Image processing software
conservación patrimonial
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by/4.0/
Repositorio
CIC Digital (CICBA)
Institución
Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
OAI Identificador
oai:digital.cic.gba.gob.ar:11746/3826

id CICBA_ad923b5866f19af263cb97797f2162ab
oai_identifier_str oai:digital.cic.gba.gob.ar:11746/3826
network_acronym_str CICBA
repository_id_str 9441
network_name_str CIC Digital (CICBA)
spelling Manuscript document digitalization and recognition: a first approachDe Giusti, Marisa RaquelVila, María MartaVillarreal, Gonzalo LujánCiencias de la Computación e InformacióndigitalizaciónImage processing softwareconservación patrimonialThe handwritten manuscript recognizing process belongs to a set of initiatives which lean to the preservation of cultural patrimony gathered in libraries and archives, where there exist a great wealth in documents and even handwritten cards that accompany incunabula books. This work is the starting point of a research and development project oriented to digitalization and recognition of manuscript materials. The paper presented here discuss different algorithms used in the first stage dedicated to image noise-cleaning in order to improve it before the character recognition process begins. In order to make the handwritten-text recognition and image digitalization process efficient, it must be preceded by a preprocessing stage of the image to be treated, which includes thresholding, noise cleaning, thinning, base-line alignment and image segmentation, among others. Each of these steps will allow us to reduce the injurious variability when recognizing manuscripts (noise, random gray levels, slanted characters, ink level in different zones), and so increasing the probability of obtaining a suitable text recognition. In this paper, two image thinning methods are considered, and implemented. Finally, an evaluation is carried out obtaining many conclusions related to efficiency, speed and requirements, as well as ideas for future implementations.2005-10-01info:eu-repo/semantics/articleinfo:eu-repo/semantics/submittedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfhttps://digital.cic.gba.gob.ar/handle/11746/3826enginfo:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by/4.0/reponame:CIC Digital (CICBA)instname:Comisión de Investigaciones Científicas de la Provincia de Buenos Airesinstacron:CICBA2025-09-29T13:40:23Zoai:digital.cic.gba.gob.ar:11746/3826Institucionalhttp://digital.cic.gba.gob.arOrganismo científico-tecnológicoNo correspondehttp://digital.cic.gba.gob.ar/oai/snrdmarisa.degiusti@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:94412025-09-29 13:40:24.101CIC Digital (CICBA) - Comisión de Investigaciones Científicas de la Provincia de Buenos Airesfalse
dc.title.none.fl_str_mv Manuscript document digitalization and recognition: a first approach
title Manuscript document digitalization and recognition: a first approach
spellingShingle Manuscript document digitalization and recognition: a first approach
De Giusti, Marisa Raquel
Ciencias de la Computación e Información
digitalización
Image processing software
conservación patrimonial
title_short Manuscript document digitalization and recognition: a first approach
title_full Manuscript document digitalization and recognition: a first approach
title_fullStr Manuscript document digitalization and recognition: a first approach
title_full_unstemmed Manuscript document digitalization and recognition: a first approach
title_sort Manuscript document digitalization and recognition: a first approach
dc.creator.none.fl_str_mv De Giusti, Marisa Raquel
Vila, María Marta
Villarreal, Gonzalo Luján
author De Giusti, Marisa Raquel
author_facet De Giusti, Marisa Raquel
Vila, María Marta
Villarreal, Gonzalo Luján
author_role author
author2 Vila, María Marta
Villarreal, Gonzalo Luján
author2_role author
author
dc.subject.none.fl_str_mv Ciencias de la Computación e Información
digitalización
Image processing software
conservación patrimonial
topic Ciencias de la Computación e Información
digitalización
Image processing software
conservación patrimonial
dc.description.none.fl_txt_mv The handwritten manuscript recognizing process belongs to a set of initiatives which lean to the preservation of cultural patrimony gathered in libraries and archives, where there exist a great wealth in documents and even handwritten cards that accompany incunabula books. This work is the starting point of a research and development project oriented to digitalization and recognition of manuscript materials. The paper presented here discuss different algorithms used in the first stage dedicated to image noise-cleaning in order to improve it before the character recognition process begins. In order to make the handwritten-text recognition and image digitalization process efficient, it must be preceded by a preprocessing stage of the image to be treated, which includes thresholding, noise cleaning, thinning, base-line alignment and image segmentation, among others. Each of these steps will allow us to reduce the injurious variability when recognizing manuscripts (noise, random gray levels, slanted characters, ink level in different zones), and so increasing the probability of obtaining a suitable text recognition. In this paper, two image thinning methods are considered, and implemented. Finally, an evaluation is carried out obtaining many conclusions related to efficiency, speed and requirements, as well as ideas for future implementations.
description The handwritten manuscript recognizing process belongs to a set of initiatives which lean to the preservation of cultural patrimony gathered in libraries and archives, where there exist a great wealth in documents and even handwritten cards that accompany incunabula books. This work is the starting point of a research and development project oriented to digitalization and recognition of manuscript materials. The paper presented here discuss different algorithms used in the first stage dedicated to image noise-cleaning in order to improve it before the character recognition process begins. In order to make the handwritten-text recognition and image digitalization process efficient, it must be preceded by a preprocessing stage of the image to be treated, which includes thresholding, noise cleaning, thinning, base-line alignment and image segmentation, among others. Each of these steps will allow us to reduce the injurious variability when recognizing manuscripts (noise, random gray levels, slanted characters, ink level in different zones), and so increasing the probability of obtaining a suitable text recognition. In this paper, two image thinning methods are considered, and implemented. Finally, an evaluation is carried out obtaining many conclusions related to efficiency, speed and requirements, as well as ideas for future implementations.
publishDate 2005
dc.date.none.fl_str_mv 2005-10-01
dc.type.none.fl_str_mv info:eu-repo/semantics/article
info:eu-repo/semantics/submittedVersion
http://purl.org/coar/resource_type/c_6501
info:ar-repo/semantics/articulo
format article
status_str submittedVersion
dc.identifier.none.fl_str_mv https://digital.cic.gba.gob.ar/handle/11746/3826
url https://digital.cic.gba.gob.ar/handle/11746/3826
dc.language.none.fl_str_mv eng
language eng
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by/4.0/
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by/4.0/
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:CIC Digital (CICBA)
instname:Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
instacron:CICBA
reponame_str CIC Digital (CICBA)
collection CIC Digital (CICBA)
instname_str Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
instacron_str CICBA
institution CICBA
repository.name.fl_str_mv CIC Digital (CICBA) - Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
repository.mail.fl_str_mv marisa.degiusti@sedici.unlp.edu.ar
_version_ 1844618622150901761
score 13.069144