2024-03-29T04:59:49Z
https://pubs.cs.uct.ac.za/cgi/oai2
oai:pubs.cs.uct.ac.za:978
2019-10-10T15:32:47Z
7375626A656374733D3130303033313230
7375626A656374733D3130303032393531:3130303033333137
74797065733D6A6F75726E616C70
https://pubs.cs.uct.ac.za/id/eprint/978/
A System for High Quality Crowdsourced Indigenous Language Transcription
Munyaradzi, Ngoni
Suleman, Hussein
Human-centered computing
Information retrieval
In this article, a crowdsourcing method is proposed to transcribe manuscripts from the Bleek and Lloyd Collection, where non-expert volunteers transcribe pages of the handwritten text using an online tool. The digital Bleek and Lloyd Collection is a rare collection that contains artwork, notebooks and dictionaries of the indigenous people of Southern Africa. The notebooks, in particular, contain stories that encode the language, culture and beliefs of these people, handwritten in now-extinct languages with a specialised notation system. Previous attempts have been made to convert the approximately 20000 pages of text to a machine-readable form using machine learning algorithms but, due to the complexity of the text, the recognition accuracy was low. This article presents details of the system used to enable transcription by volunteers as well as results from experiments that were conducted to determine the quality and consistency of transcriptions. The results show that volunteeers are able to produce reliable transcriptions of high quality. The inter-transcriber agreement is 80% for |Xam text and 95% for English text. When the |Xam text transcriptions produced by the volunteers are compared with a gold standard, the volunteers achieve an average accuracy of 64.75%, which exceeded that in previous work. Finally, the degree of transcription agreement correlates with the degree of transcription accuracy. This suggests that the quality of unseen data can be assessed based on the degree of agreement among transcribers.
Springer
2014
Journal article (paginated)
application/pdf
en
https://pubs.cs.uct.ac.za/id/eprint/978/1/ijdl_2013_transcription.pdf
Munyaradzi, Ngoni and Suleman, Hussein (2014) A System for High Quality Crowdsourced Indigenous Language Transcription, International Journal on Digital Libraries, 14, 117-125, Springer.