A Comparison of Machine Learning Techniques for Handwritten |Xam Word Recognition

Williams, Kyle and Suleman, Hussein and Paihama, Jorgina K. do R. (2013) A Comparison of Machine Learning Techniques for Handwritten |Xam Word Recognition, South African Institute for Computer Scientists and Information Technologists (SAICSIT'13), October 07-09 2013, East London, South Africa, ACM.

[img] PDF
p37-williams.pdf

Download (3MB)

Abstract

The Bleek and Lloyd collection contains 19th century handwritten notebooks that document the language and culture of the |Xam-speaking people who lived in Southern Africa. Access to this rich data could be enhanced by transcriptions of the text; however, the complex diacritics used in the notebooks complicate the process of transcription. Machine learning techniques could be used to perform this transcription, but it is not known which techniques would produce the best results. This paper thus reports on a comparison of 3 popular techniques applied to this problem: artificial neural networks (ANN); hidden Markov models (HMM); and support vector machines (SVM). It was found that an SVM-based classifier using histograms of oriented gradients as features resulted in the best word recognition accuracy of 58.4%. Furthermore, it was found that most feature extraction parameters did not have a large effect on recognition accuracy and that the SVM-based recognisers outperform both ANN- and HMM-based recognisers.

Item Type: Conference proceedings
Uncontrolled Keywords: OCR, handwriting recognition, cultural heritage preservation, Bleek and Lloyd Collection
Subjects: Computing methodologies > Artificial intelligence
Applied computing > Document management and text processing
Information systems > Information retrieval
Alternate Locations: http://dx.doi.org/10.1145/2513456.2513463
Date Deposited: 28 Oct 2013
Last Modified: 10 Oct 2019 15:32
URI: http://pubs.cs.uct.ac.za/id/eprint/894

Actions (login required)

View Item View Item