UCT CS Research Document Archive

Creating a Handwriting Recognition Corpus for Bushman Languages

Williams, Kyle and Hussein Suleman (2011) Creating a Handwriting Recognition Corpus for Bushman Languages. In Xing, Chunxiao, Fabio Crestani and Andreas Rauber, Eds. Proceedings 13th International Conference on Asia-Pacific Digital Libraries, pages 222-231, Beijing, P.R. China.

Full text available as:
PDF - Requires Adobe Acrobat Reader or other PDF viewer.

Abstract

Handwriting recognition systems rely on the existence of a corpus for training recognition models and evaluating accuracy. Creating a handwriting recognition corpus for the Bushman languages of southern Africa is difficult due to the complexities of the script used to represent them and the fact that this script cannot be represented using Unicode. To solve this problem, a semi-automatic Web-based tool was developed to segment, capture and encode the Bushman text. A case study demonstrated how the tool could be used to create a Bushman handwriting corpus with few errors.

EPrint Type:Conference Paper
Keywords:Corpus creation, transcription, digital libraries
Subjects:I Computing Methodologies: I.7 DOCUMENT AND TEXT PROCESSING
H Information Systems: H.0 GENERAL
ID Code:734
Deposited By:Williams, Kyle Mark
Deposited On:18 November 2011
Alternative Locations:http://www.springerlink.com/content/kpm241680l23m240/