UCT CS Research Document Archive

Learning to Read Bushman

Williams, Kyle and Hussein Suleman (2010) Learning to Read Bushman. In Proceedings SAICSIT Postgraduate Symposium, Bela Bela.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.


The notebooks in the Bleek and Lloyd collection contain handwritten stories that metaphorically encode the Bushman culture and are useful to researchers and scholars trying to understand Bushman language and culture. These notebooks, however, only exist as scanned images and therefore the stories they contain cannot be searched, indexed or compared. This research seeks to investigate how accurately the Bushman stories can be automatically converted from images to text, in a process known as transcription, and also to explore the various techniques for doing this. The expected contribution is a measurement of how accurately transcription can be automatically performed as well as a comparison of different techniques for doing this.

EPrint Type:Conference Poster
Keywords:OCR, machine learning, handwritten manuscripts, cultural heritage preservation, digital libraries
Subjects:I Computing Methodologies: I.2 ARTIFICIAL INTELLIGENCE
I Computing Methodologies: I.7 DOCUMENT AND TEXT PROCESSING
ID Code:683
Deposited By:Williams, Kyle Mark
Deposited On:23 March 2011