Manuscript document digitalization and recognition: a first approach
- Autores
- De Giusti, Marisa Raquel; Vila, María Marta; Villarreal, Gonzalo Luján
- Año de publicación
- 2005
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- The handwritten manuscript recognizing process belongs to a set of initiatives which lean to the preservation of cultural patrimony gathered in libraries and archives, where there exist a great wealth in documents and even handwritten cards that accompany incunabula books. This work is the starting point of a research and development project oriented to digitalization and recognition of manuscript materials. The paper presented here discuss different algorithms used in the first stage dedicated to "image noise-cleaning" in order to improve it before the character recognition process begins. In order to make the handwritten-text recognition and image digitalization process efficient, it must be preceded by a preprocessing stage of the image to be treated, which includes thresholding, noise cleaning, thinning, base-line alignment and image segmentation, among others. Each of these steps will allow us to reduce the injurious variability when recognizing manuscripts (noise, random gray levels, slanted characters, ink level in different zones), and so increasing the probability of obtaining a suitable text recognition. In this paper, two image thinning methods are considered, and implemented. Finally, an evaluation is carried out obtaining many conclusions related to efficiency, speed and requirements, as well as ideas for future implementations.
Dirección PREBI-SEDICI - Materia
-
Ciencias Informáticas
Image processing software
digitalización
conservación patrimonial - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by/3.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/5521
Ver los metadatos del registro completo
id |
SEDICI_d0a780e7ff0240ba138b8df08e2c4de1 |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/5521 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
Manuscript document digitalization and recognition: a first approachDe Giusti, Marisa RaquelVila, María MartaVillarreal, Gonzalo LujánCiencias InformáticasImage processing softwaredigitalizaciónconservación patrimonialThe handwritten manuscript recognizing process belongs to a set of initiatives which lean to the preservation of cultural patrimony gathered in libraries and archives, where there exist a great wealth in documents and even handwritten cards that accompany incunabula books. This work is the starting point of a research and development project oriented to digitalization and recognition of manuscript materials. The paper presented here discuss different algorithms used in the first stage dedicated to "image noise-cleaning" in order to improve it before the character recognition process begins. In order to make the handwritten-text recognition and image digitalization process efficient, it must be preceded by a preprocessing stage of the image to be treated, which includes thresholding, noise cleaning, thinning, base-line alignment and image segmentation, among others. Each of these steps will allow us to reduce the injurious variability when recognizing manuscripts (noise, random gray levels, slanted characters, ink level in different zones), and so increasing the probability of obtaining a suitable text recognition. In this paper, two image thinning methods are considered, and implemented. Finally, an evaluation is carried out obtaining many conclusions related to efficiency, speed and requirements, as well as ideas for future implementations.Dirección PREBI-SEDICI2005-10info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionArticulohttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdf158-163http://sedici.unlp.edu.ar/handle/10915/5521enginfo:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/wp-content/uploads/JCST-Oct05-7.pdfinfo:eu-repo/semantics/altIdentifier/issn/1666-6038info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by/3.0/Creative Commons Attribution 3.0 Unported (CC BY 3.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T10:49:41Zoai:sedici.unlp.edu.ar:10915/5521Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 10:49:41.747SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
Manuscript document digitalization and recognition: a first approach |
title |
Manuscript document digitalization and recognition: a first approach |
spellingShingle |
Manuscript document digitalization and recognition: a first approach De Giusti, Marisa Raquel Ciencias Informáticas Image processing software digitalización conservación patrimonial |
title_short |
Manuscript document digitalization and recognition: a first approach |
title_full |
Manuscript document digitalization and recognition: a first approach |
title_fullStr |
Manuscript document digitalization and recognition: a first approach |
title_full_unstemmed |
Manuscript document digitalization and recognition: a first approach |
title_sort |
Manuscript document digitalization and recognition: a first approach |
dc.creator.none.fl_str_mv |
De Giusti, Marisa Raquel Vila, María Marta Villarreal, Gonzalo Luján |
author |
De Giusti, Marisa Raquel |
author_facet |
De Giusti, Marisa Raquel Vila, María Marta Villarreal, Gonzalo Luján |
author_role |
author |
author2 |
Vila, María Marta Villarreal, Gonzalo Luján |
author2_role |
author author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas Image processing software digitalización conservación patrimonial |
topic |
Ciencias Informáticas Image processing software digitalización conservación patrimonial |
dc.description.none.fl_txt_mv |
The handwritten manuscript recognizing process belongs to a set of initiatives which lean to the preservation of cultural patrimony gathered in libraries and archives, where there exist a great wealth in documents and even handwritten cards that accompany incunabula books. This work is the starting point of a research and development project oriented to digitalization and recognition of manuscript materials. The paper presented here discuss different algorithms used in the first stage dedicated to "image noise-cleaning" in order to improve it before the character recognition process begins. In order to make the handwritten-text recognition and image digitalization process efficient, it must be preceded by a preprocessing stage of the image to be treated, which includes thresholding, noise cleaning, thinning, base-line alignment and image segmentation, among others. Each of these steps will allow us to reduce the injurious variability when recognizing manuscripts (noise, random gray levels, slanted characters, ink level in different zones), and so increasing the probability of obtaining a suitable text recognition. In this paper, two image thinning methods are considered, and implemented. Finally, an evaluation is carried out obtaining many conclusions related to efficiency, speed and requirements, as well as ideas for future implementations. Dirección PREBI-SEDICI |
description |
The handwritten manuscript recognizing process belongs to a set of initiatives which lean to the preservation of cultural patrimony gathered in libraries and archives, where there exist a great wealth in documents and even handwritten cards that accompany incunabula books. This work is the starting point of a research and development project oriented to digitalization and recognition of manuscript materials. The paper presented here discuss different algorithms used in the first stage dedicated to "image noise-cleaning" in order to improve it before the character recognition process begins. In order to make the handwritten-text recognition and image digitalization process efficient, it must be preceded by a preprocessing stage of the image to be treated, which includes thresholding, noise cleaning, thinning, base-line alignment and image segmentation, among others. Each of these steps will allow us to reduce the injurious variability when recognizing manuscripts (noise, random gray levels, slanted characters, ink level in different zones), and so increasing the probability of obtaining a suitable text recognition. In this paper, two image thinning methods are considered, and implemented. Finally, an evaluation is carried out obtaining many conclusions related to efficiency, speed and requirements, as well as ideas for future implementations. |
publishDate |
2005 |
dc.date.none.fl_str_mv |
2005-10 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion Articulo http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/5521 |
url |
http://sedici.unlp.edu.ar/handle/10915/5521 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/http://journal.info.unlp.edu.ar/wp-content/uploads/JCST-Oct05-7.pdf info:eu-repo/semantics/altIdentifier/issn/1666-6038 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/3.0/ Creative Commons Attribution 3.0 Unported (CC BY 3.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by/3.0/ Creative Commons Attribution 3.0 Unported (CC BY 3.0) |
dc.format.none.fl_str_mv |
application/pdf 158-163 |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1844615750702071808 |
score |
13.070432 |