UCT CS Research Document Archive

A Comparison of Machine Learning Techniques for Handwritten |Xam Word Recognition

Williams, Kyle, Hussein Suleman and Jorgina K. do R. Paihama (2013) A Comparison of Machine Learning Techniques for Handwritten |Xam Word Recognition. South African Institute for Computer Scientists and Information Technologists (SAICSIT'13), East London, South Africa.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.

Abstract

The Bleek and Lloyd collection contains 19th century handwritten notebooks that document the language and culture of the |Xam-speaking people who lived in Southern Africa. Access to this rich data could be enhanced by transcriptions of the text; however, the complex diacritics used in the notebooks complicate the process of transcription. Machine learning techniques could be used to perform this transcription, but it is not known which techniques would produce the best results. This paper thus reports on a comparison of 3 popular techniques applied to this problem: artificial neural networks (ANN); hidden Markov models (HMM); and support vector machines (SVM). It was found that an SVM-based classifier using histograms of oriented gradients as features resulted in the best word recognition accuracy of 58.4%. Furthermore, it was found that most feature extraction parameters did not have a large effect on recognition accuracy and that the SVM-based recognisers outperform both ANN- and HMM-based recognisers.

EPrint Type:Conference Proceedings
Keywords:OCR, handwriting recognition, cultural heritage preservation, Bleek and Lloyd Collection
Subjects:I Computing Methodologies: I.2 ARTIFICIAL INTELLIGENCE
I Computing Methodologies: I.7 DOCUMENT AND TEXT PROCESSING
H Information Systems: H.3 INFORMATION STORAGE AND RETRIEVAL
ID Code:894
Deposited By:Paihama, J. K. do R.
Deposited On:28 October 2013
Alternative Locations:http://dx.doi.org/10.1145/2513456.2513463