TY - GEN
T1 - OCR for Greek polytonic (multi accent) historical printed documents
T2 - 3rd International Conference on Digital Access to Cultural Textual Heritage, DATeCH 2019
AU - Sichani, Anna Maria
AU - Kaddas, Panagiotis
AU - Mikros, Georgios K.
AU - Gatos, Basilis
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/5/8
Y1 - 2019/5/8
N2 - This paper presents the development and implementation of a robust OCR tool and a related comprehensive workflow for the recognition of Greek printed polytonic scripts. This project is initiated and developed by an interdisciplinary team with expertise in the areas of document image processing, character segmentation and recognition, machine learning, corpus creation and digital humanities. Our paper aims to describe the design and development of the workflow around this project, including data gathering and structuring, OCR tool development, user interface development, experiments on the training procedure of the tool, evaluation, post-correction and quality control of the results.
AB - This paper presents the development and implementation of a robust OCR tool and a related comprehensive workflow for the recognition of Greek printed polytonic scripts. This project is initiated and developed by an interdisciplinary team with expertise in the areas of document image processing, character segmentation and recognition, machine learning, corpus creation and digital humanities. Our paper aims to describe the design and development of the workflow around this project, including data gathering and structuring, OCR tool development, user interface development, experiments on the training procedure of the tool, evaluation, post-correction and quality control of the results.
KW - Greek polytonic scripts
KW - Historical printed documents
KW - Image processing
KW - Machine learning
KW - Optical character recognition
KW - Page segmentation
KW - Post correction workflow
UR - https://www.scopus.com/pages/publications/85074879157
U2 - 10.1145/3322905.3322926
DO - 10.1145/3322905.3322926
M3 - Conference contribution
AN - SCOPUS:85074879157
T3 - ACM International Conference Proceeding Series
SP - 9
EP - 13
BT - 3rd International Conference on Digital Access to Textual Cultural Heritage, DATeCH 2019 - Conference Proceedings
PB - Association for Computing Machinery
Y2 - 8 May 2019 through 10 May 2019
ER -