Manuscript document digitalization and recognition: a first approach

Autores: De Giusti, Marisa Raquel; Vila, María Marta; Villarreal, Gonzalo Luján
Año de publicación: 2005
Idioma: inglés
Tipo de recurso: artículo
Estado: versión publicada
Descripción: The handwritten manuscript recognizing process belongs to a set of initiatives which lean to the preservation of cultural patrimony gathered in libraries and archives, where there exist a great wealth in documents and even handwritten cards that accompany incunabula books. This work is the starting point of a research and development project oriented to digitalization and recognition of manuscript materials. The paper presented here discuss different algorithms used in the first stage dedicated to "image noise-cleaning" in order to improve it before the character recognition process begins. In order to make the handwritten-text recognition and image digitalization process efficient, it must be preceded by a preprocessing stage of the image to be treated, which includes thresholding, noise cleaning, thinning, base-line alignment and image segmentation, among others. Each of these steps will allow us to reduce the injurious variability when recognizing manuscripts (noise, random gray levels, slanted characters, ink level in different zones), and so increasing the probability of obtaining a suitable text recognition. In this paper, two image thinning methods are considered, and implemented. Finally, an evaluation is carried out obtaining many conclusions related to efficiency, speed and requirements, as well as ideas for future implementations.
Dirección PREBI-SEDICI
Materia: Ciencias Informáticas
Image processing software
digitalización
conservación patrimonial
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by/3.0/
Repositorio
Institución: Universidad Nacional de La Plata
OAI Identificador: oai:sedici.unlp.edu.ar:10915/5521

Acceder

id	SEDICI_d0a780e7ff0240ba138b8df08e2c4de1
oai_identifier_str	oai:sedici.unlp.edu.ar:10915/5521
network_acronym_str	SEDICI
repository_id_str	1329
network_name_str	SEDICI (UNLP)
spelling	Manuscript document digitalization and recognition: a first approachDe Giusti, Marisa RaquelVila, María MartaVillarreal, Gonzalo LujánCiencias InformáticasImage processing softwaredigitalizaciónconservación patrimonialThe handwritten manuscript recognizing process belongs to a set of initiatives which lean to the preservation of cultural patrimony gathered in libraries and archives, where there exist a great wealth in documents and even handwritten cards that accompany incunabula books. This work is the starting point of a research and development project oriented to digitalization and recognition of manuscript materials. The paper presented here discuss different algorithms used in the first stage dedicated to "image noise-cleaning" in order to improve it before the character recognition process begins. In order to make the handwritten-text recognition and image digitalization process efficient, it must be preceded by a preprocessing stage of the image to be treated, which includes thresholding, noise cleaning, thinning, base-line alignment and image segmentation, among others. Each of these steps will allow us to reduce the injurious variability when recognizing manuscripts (noise, random gray levels, slanted characters, ink level in different zones), and so increasing the probability of obtaining a suitable text recognition. In this paper, two image thinning methods are considered, and implemented. Finally, an evaluation is carried out obtaining many conclusions related to efficiency, speed and requirements, as well as ideas for future implementations.Dirección PREBI-SEDICI2005-10info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionArticulohttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdf158-163http://sedici.unlp.edu.ar/handle/10915/5521enginfo:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/wp-content/uploads/JCST-Oct05-7.pdfinfo:eu-repo/semantics/altIdentifier/issn/1666-6038info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by/3.0/Creative Commons Attribution 3.0 Unported (CC BY 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2026-05-27T10:44:22Zoai:sedici.unlp.edu.ar:10915/5521Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292026-05-27 10:44:22.871SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv	Manuscript document digitalization and recognition: a first approach
title	Manuscript document digitalization and recognition: a first approach
spellingShingle	Manuscript document digitalization and recognition: a first approach De Giusti, Marisa Raquel Ciencias Informáticas Image processing software digitalización conservación patrimonial
title_short	Manuscript document digitalization and recognition: a first approach
title_full	Manuscript document digitalization and recognition: a first approach
title_fullStr	Manuscript document digitalization and recognition: a first approach
title_full_unstemmed	Manuscript document digitalization and recognition: a first approach
title_sort	Manuscript document digitalization and recognition: a first approach
dc.creator.none.fl_str_mv	De Giusti, Marisa Raquel Vila, María Marta Villarreal, Gonzalo Luján
author	De Giusti, Marisa Raquel
author_facet	De Giusti, Marisa Raquel Vila, María Marta Villarreal, Gonzalo Luján
author_role	author
author2	Vila, María Marta Villarreal, Gonzalo Luján
author2_role	author author
dc.subject.none.fl_str_mv	Ciencias Informáticas Image processing software digitalización conservación patrimonial
topic	Ciencias Informáticas Image processing software digitalización conservación patrimonial
dc.description.none.fl_txt_mv	The handwritten manuscript recognizing process belongs to a set of initiatives which lean to the preservation of cultural patrimony gathered in libraries and archives, where there exist a great wealth in documents and even handwritten cards that accompany incunabula books. This work is the starting point of a research and development project oriented to digitalization and recognition of manuscript materials. The paper presented here discuss different algorithms used in the first stage dedicated to "image noise-cleaning" in order to improve it before the character recognition process begins. In order to make the handwritten-text recognition and image digitalization process efficient, it must be preceded by a preprocessing stage of the image to be treated, which includes thresholding, noise cleaning, thinning, base-line alignment and image segmentation, among others. Each of these steps will allow us to reduce the injurious variability when recognizing manuscripts (noise, random gray levels, slanted characters, ink level in different zones), and so increasing the probability of obtaining a suitable text recognition. In this paper, two image thinning methods are considered, and implemented. Finally, an evaluation is carried out obtaining many conclusions related to efficiency, speed and requirements, as well as ideas for future implementations. Dirección PREBI-SEDICI
description	The handwritten manuscript recognizing process belongs to a set of initiatives which lean to the preservation of cultural patrimony gathered in libraries and archives, where there exist a great wealth in documents and even handwritten cards that accompany incunabula books. This work is the starting point of a research and development project oriented to digitalization and recognition of manuscript materials. The paper presented here discuss different algorithms used in the first stage dedicated to "image noise-cleaning" in order to improve it before the character recognition process begins. In order to make the handwritten-text recognition and image digitalization process efficient, it must be preceded by a preprocessing stage of the image to be treated, which includes thresholding, noise cleaning, thinning, base-line alignment and image segmentation, among others. Each of these steps will allow us to reduce the injurious variability when recognizing manuscripts (noise, random gray levels, slanted characters, ink level in different zones), and so increasing the probability of obtaining a suitable text recognition. In this paper, two image thinning methods are considered, and implemented. Finally, an evaluation is carried out obtaining many conclusions related to efficiency, speed and requirements, as well as ideas for future implementations.
publishDate	2005
dc.date.none.fl_str_mv	2005-10
dc.type.none.fl_str_mv	info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion Articulo http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo
format	article
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://sedici.unlp.edu.ar/handle/10915/5521
url	http://sedici.unlp.edu.ar/handle/10915/5521
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/wp-content/uploads/JCST-Oct05-7.pdf info:eu-repo/semantics/altIdentifier/issn/1666-6038
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/3.0/ Creative Commons Attribution 3.0 Unported (CC BY 3.0)
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by/3.0/ Creative Commons Attribution 3.0 Unported (CC BY 3.0)
dc.format.none.fl_str_mv	application/pdf 158-163
dc.source.none.fl_str_mv	reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP
reponame_str	SEDICI (UNLP)
collection	SEDICI (UNLP)
instname_str	Universidad Nacional de La Plata
instacron_str	UNLP
institution	UNLP
repository.name.fl_str_mv	SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv	alira@sedici.unlp.edu.ar
_version_	1866371160338333696
score	13.040872

Manuscript document digitalization and recognition: a first approach

Publicaciones similares