Williams, Kyle and Suleman, Hussein and Paihama, Jorgina K. do R. (2013) A Comparison of Machine Learning Techniques for Handwritten |Xam Word Recognition, South African Institute for Computer Scientists and Information Technologists (SAICSIT'13), October 07-09 2013, East London, South Africa, ACM.
PDF
p37-williams.pdf Download (3MB) |
Abstract
The Bleek and Lloyd collection contains 19th century handwritten notebooks that document the language and culture of the |Xam-speaking people who lived in Southern Africa. Access to this rich data could be enhanced by transcriptions of the text; however, the complex diacritics used in the notebooks complicate the process of transcription. Machine learning techniques could be used to perform this transcription, but it is not known which techniques would produce the best results. This paper thus reports on a comparison of 3 popular techniques applied to this problem: artificial neural networks (ANN); hidden Markov models (HMM); and support vector machines (SVM). It was found that an SVM-based classifier using histograms of oriented gradients as features resulted in the best word recognition accuracy of 58.4%. Furthermore, it was found that most feature extraction parameters did not have a large effect on recognition accuracy and that the SVM-based recognisers outperform both ANN- and HMM-based recognisers.
Item Type: | Conference proceedings |
---|---|
Uncontrolled Keywords: | OCR, handwriting recognition, cultural heritage preservation, Bleek and Lloyd Collection |
Subjects: | Computing methodologies > Artificial intelligence Applied computing > Document management and text processing Information systems > Information retrieval |
Alternate Locations: | http://dx.doi.org/10.1145/2513456.2513463 |
Date Deposited: | 28 Oct 2013 |
Last Modified: | 10 Oct 2019 15:32 |
URI: | http://pubs.cs.uct.ac.za/id/eprint/894 |
Actions (login required)
View Item |